Saturday, September 24, 2016

Golem: Trustless Distributed P2P Hosting

Project Proposal Draft / RFC


For my independent study, I must design a distributed application which uses the design patterns from the systems I am studying. I have an inkling of where I want this to go. This is the initial abstract of my idea.

I2P and Tor work because at each step, each node has limited visibility. They're only able to see the information that is necessary for them to do their work, and the client expects responses to be encrypted by keys that the current node doesn't have.

This is much like the idea of parametricity in generic programming. When a function takes two arguments, each of which is of a type that is a generic type without constraints, the function is very limited on what it can do. A function which has a type signature of A -> A where A is a generic argument can only be the id function because the function has no way to get an A other than the one which is passed to it.

Similarly, a Tor node which has a job of passing an encrypted message along can't inspect the encrypted message or alter it in any meaningful way (when it can, that's an attack). What if Tor nodes didn't just pass along a message to the next node in the circuit, but were expected to do meaningful work?

Consider the program below:

int foo(int a, string b) {
    int len = strlen(b)
    return max(len, a)
}

Now let Node A only have access to the function foo, Node B only have access to the function strlen, and let Node C only have access to the function max. They don't know they're holding these specific functions, only that they're holding functions which respond to the hashes of function names and the hashes of data constructors. Max and strlen require access to primitive datatypes while foo doesn't, and so only max and strlen will be able to see these.

When someone wants to call foo, Node A can send back a list of the identifiers for Nodes which have a copy of strlen and a list of the nodes which have a copy of max. It doesn't say the function name, but it says the subset of the input which each node will see. As the function cannot see any state not sent by the user (will clarify later), we know that the data sent along will be subsets of the input sent to each function node. Eventually, it will bottom out, or the program will have a trivially-detectable loop in the call graph and we can reject it. The latter can be identified because we will require the use of map, filter, and fold (catamorphisms) to work with the data.

This server list will include a "score" for each node which is made by summing up all of the measured latencies for each node servicing a request, find out the amount that each node contributed to this sum, and then take the inverse of that. These values can split up a number line into slices, where the faster nodes have a larger piece of the slice. The client is thus able to pick a random number and use it to pick a node. The client can force node B to be from a different country than node A to make legal attacks more difficult. These values are stored in a DHT much like the I2P routerinfo tables. This DHT is rate-limited by requiring the user to make a very cheap proof of work on each query, and a more expensive proof of work whenever a server wishes to join the network.

Now if the strlen function needed to call another function, the node that was chosen would negotiate with the client through messages relayed through the node containing foo to return a list of clients to pick. Eventually, the peer would find the transitive closure of the functions which are necessary to call. As the peer knows the shape of the data, data constructor tags can be sent to the functions to predict the flow in order to avoid needlessly picking nodes.

We will require that all control flow be done by recursing over finite data structures (more on that later) which makes it possible to prove that the program cannot enter an infinite loop. A "maximum number of loops" integer can suffice as a hacky data structure, but the data sent by the request or the application state is probably going to determine the flow.

Execution

Now the client will have the nodes route messages to the callers like Tor, to hide user network identity to some degree. The client will use asymmetric encryption to negotiate one-time symmetric keys with each node in the chain. The client now knows which node will need to see which data, and so can encrypt the data for each node. This may require duplication of some of the data, but a well-designed application should be able to minimize the amount of duplication. Encryption schemes which allow multiple keys to decode the same data can also be used here.

Now the node holding foo will see a data constructor that contains key/value pairs for a and b. These names will be removed, the nodes will see only hash identifiers for variable names. It will then send along these encrypted values to the other nodes to get back the encrypted results. These results cannot be inspected by the calling function. All it can do is apply a data constructor (really a hash tag on an ordering of data) and return it to the user. There is a limit on the damage each node can do, as all it sees is data constructor tags and can return references to the DHT. The idea is that *real* security breaches require that a node see multiple pieces of primitive data at once. You can't attack a user without resorting to brute force unless you see all fields of their login, for instance. By recursively splitting up the data, nobody sees enough to figure out how the codebase works or what the user is really doing.

To further make the job of an evil server harder, we will obfuscate function definitions. A function really figures out what to do by analyzing the constructor used for datatypes and dispatching to the jump table for each function. We can combine function bodies as long as the argument data types are entirely different, and can even add entries to the jump table that a user should never send. The computational complexity of extracting a dataflow graph from function bodies made like this is pretty high. It also requires that an adversary see all of the functions. This will never happen.

Supervisors

The system is really a way for a code author and a system user to communicate, using the servers as an untrusted resource. We call the code author a Supervisor. The cluster will have many, many codebases. Each codebase is identified by the public key of the supervisor. The supervisor is responsible for writing their code using an ORM-like interface which exposes the constraints of the system.

This code is then versioned, with the application having versions. A user will include the current version hash in their messages, and servers must match the hash. The hash will be stored in a blockchain block which is signed by the supervisor. This blockchain will hold two types of entries: DHT snapshots and server trust roles.

Global State

Users are responsible for directly querying the DHT to get the desired information that must be used by the servers to process their data. This DHT is really a pseudo-Merkle-tree made by having each DHT-holding server create a Merkle tree on their local machine and sending a cryptographically signed hash of the root node to the blockchain after each time slice. These hashes act like transactions and are mined in blocks. When a User queries the DHT, they get the information required to verify the data is in the Merkle tree, the blockchain-insertion-offset, and the data itself.

We see here that all state is global. This is true to some degree. The DHT requires a HashCash proof of work for queries and inserts, so there is limited visibility. Nobody is going to dump the entire database. Furthermore, old data will be purged from the DHT when the key is rewritten after a block is mined (it sticks around between inserts so that the state transition can be audited). Only the hash of the old data will be immutable. Lastly, I don't think there's anything wrong with this. A user should not put trust in the supervisor to use sensitive data correctly. That is why the servers do not insert the user data into the DHT themselves, the user will see the input and the output in the global state and so can audit their privacy exposure.

The supervisor will be expected to review the blockchain and to sign blocks every now and then. This allows an attentive Supervisor to collect aggregates with some degree of accuracy, without being able to retroactively scrape the database or see the entire system state. The Supervisor is as rate limited as everyone else. We define the most valid application blockchain to be the one with the most supervisor signatures in it. Their job is the final data integrity verifier, as well as the code creator.


Server Administration

Each server will be part of a cluster serving requests for many, many applications. I expect there to be a single pool of computers in Golem, much like a typical cloud hosting platform. These servers will register themselves in the DHT as "open for new applications" if their capacity is below a certain threshold. The DHT nodes will get an "application creation insertion" request and will propagate the message to servers in their local store which are available. Eventually a server will notify the cluster that it is at capacity. Application creations will carry promises of reward with them. This can be done through smart contracts on another blockchain, such as ethereum, where the evidence of work done is enough to get a payout. 

Reaping this reward requires a careful handshake. Servers start by using HashCash to mine themselves a work contract. This proof of work makes the identity of the server expensive to make, and is the asset that the server protects and enriches.

Then comes requesting the hash of the blockchain from the supervisor's DHT application versioning update. They then join the mini-cluster mining this blockchain. Every minute or so, a new block is mined. These blocks hold the DHT references. A server which detects a state transition for a key between two snapshots which isn't valid as per the finite state machine published by the supervisor will reject the transition by appending the rejection to the block being mined. All servers doing this work will duplicate verification. They will then sign this transaction and send the block with the signature to the blockchain with a message to the supervisor. The first server to finish validating gets a tiny reward. Already-mining servers can check that the block uploaded hashes to the one they made locally, and sign the block. They send this signature to the supervisor. The supervisor collects all of the servers they see sign it, verify it is correct themselves (don't need to do all of the time, as the threat of "firing" a server means they can check every few updates), and then sign the server signatures. The supervisor's signatures is the only one that matters for an application blockchain. The valid one is the one with the most signatures from the supervisor. 

If the first server makes a mistake, another server can steal their reward by posting evidence of the correct version to the supervisor along with a copy of the posted (incorrect) block. Servers which make mistakes have their identities destroyed by the Supervisor "tombstoning" their public key, requiring that they redo a HashCash transaction. 

Notice that the first server to mine is the only one rewarded for mining. Where is the motivation for the other servers? Servers which do not mine for more than one timeslice will be marked offline by the application in the global Golem DHT, meaning that they get no traffic. When a user wants to interact with the system, they send the supervisor a request to give them "tokens". These may be free, ad-supported, or paid tokens. These tokens are sent to every single node involved in the transitive closure of an API endpoint request. At the end of a work slice, a server will collect all of these tokens and will place them in a message which is encrypted with the supervisor's public key and signed with the server's identity private key. The supervisor knows what the program flow should be, and so gets a view of the code paths hit by the user without knowing the data that was actually provided. The server signs the tokens in a microtransaction to pay them for this work slice. 

If a server choses to "go rogue" and to send the wrong control flow routing information to a client, then the client will negotiate with the wrong servers and the supervisor will see keys from the wrong nodes. Nodes are picked to be unlikely to be cooperatively compromised. Keys may go missing from servers who leave the network, but it is possible to heuristically prove that data is only missing and that the state transition was correct, not wrong. These tokens allow the servers to be audited for honesty. Like Foucault's panopticon, it is the nonzero probability of their bad behavior being observed that makes bad behavior economically devastating for servers. 

Lastly, a server gets rewards only after having an "identity chain" over a certain length. This equates to an hour of work or so. Making the identity requires mining the HashCash, and each block which the supervisor signs which contains their signature becomes another block in their individual transaction history, or "identity chain." When they want to be paid, they publish the last microtransaction signature from the server and their identity chain's last hash to a smart contract blockchain. The miners can verify the length of the chain and the microtransaction, and will pay out. Cashing out requires the server and supervisor negotiate a new microtransaction. 

Lastly, what about evil supervisors? What if they don't want to pay the server? The fine-grained nature of the system and the microtransactions means that there is a small, bounded work to be done by the server between they get a guarantee of payment. Furthermore if the supervisor incorrectly tombstones their identity, the server has a cryptographically-verified log of the work they did and the state they saw, and can prove that they met the server's function specification due to showing that the application state changed correctly (assuming the User inserted their output into the DHT). There are possible attacks here, but the fact that a server takes time to have their bandwidth "ramp up" in the Golem global network due to the Tor-like bandwidth monitoring of nodes over time, and the small granularity of payments, means that a dishonest supervisor has a small amount of time in which they can behave badly before they get a bad reputation in the Golem network. A bad reputation (many server "complaint" messages, each of which has a HashCash requirement), means that servers will reject work requests and that users will get an inkling that the service-provider has bad business practices. 





Thursday, September 22, 2016

An Internet We Can Agree On: Secured DNS through Bitcoin


The Internet is peculiar place. People will tell you that it is the first thing of value truly owned by everybody and nobody collectively. This could not be farther from the truth. We do not own the Internet because we do not own the means of coordination. 

While the internet is ostensibly a swarm of nameless CPUs with network interfaces, playing IP address roulette is outside the scope of anybody's time. We rely on DNS, which means that we rely on the goodwill of the government of the United States to continue to run ICANN in a non-hostile manner. This trust has been violated numerous times; website domain names have been seized and the sites redirected to this image:



Furthermore, the security of server public key distribution has become a joke that stopped being funny a long time ago. When you visit a website, you're given a chain of certificates. Each certificate proves that an entity trusts the site, and the end of the chain is somebody that you trust absolutely. The problem with this absolute trust is that it is violated quite regularly. Legal pressures and financial incentives have made these authorities give out duplicate certificates in the past such that impersonation becomes practical. We can only expect more and more such demands. [ https://www.schneier.com/academic/archives/2000/01/ten_risks_of_pki_wha.html ] 

If the bedrock of the HTTP-driven Internet is not safe from bad actors, then the entire system is compromised. Most of the major sites you visit have had attacks attempted on them at one time or another in this vein. It's only a matter of time before warrantless SSL-wiretaps become a thing, potentially with malware on the other side of that transmission. 

Namecoin wants to put the DNS and CA system into the hands of the Internet itself by storing the state on an alternate blockchain that manages to borrow the work of Bitcoin mining without sacrificing security.

Threat Model


The threat model of Namecoin is very much like the threat model of Bitcoin, with a few specific differences. The network will be eventually consistent with a linear ordering, where state changes become cryptographically intractable to alter after their blocks have become farther back in the blockchain. Due to the use of Namecoin for identity manipulation, there are extra attacks to worry about. 

If an attacker sees you trying to put a block into the network that registers "myMillionDollarName," then they could decide that they want this name instead. They might want to sell it back to you at a high price. Namecoin solves this by separating the registration process into two steps. At first, the user inserts a block into the network that contains only the hash of the name that you want. After this hash is in a block at least 12 blocks away from the newest mined block, namecoin will allow you to submit a "first update" transaction that exposes the actual name and information. Future updates can be made to change where the address points, and to renew it before it expires. 

What if an attacker wants to register all alphanumeric names, and become the new ICANN by selling them to you at a steep premium and under a strict contract. This becomes very difficult. An attacker must spend money on the registration transaction fee, but must also spend money on a registration fee when they send their first "block registration" block. This registration fee is destroyed, it cannot be somehow revived and reused by anybody. There are no registration fees for renewals or updates, but a transaction fee does apply. You have to renew or update a name every 35,999 blocks at the latest (between 200 and 250 days), otherwise it expires. This makes holding on to a lot of domains expensive.

On the other hand, Namecoin offers no protection against squatting. A squatter can register "google" and demand a ransom when Google decides to get a namecoin address. There's not much one can do here without requiring out-of-bound validation, which some would consider an attack on the network itself.

Architecture


Namecoin is currently a naming system with two namespaces. One of them is for the identities of people, while the other is for the identities of software systems. 

Identities for people consist of a mapping between usernames and contact information for that identity. The GPG fingerprint can be included in this chain to act as a key registry for email transmission, though the actual keys are not stored to conserve space. 

Domain names are really a key-value store between a string key and a set of attributes that allows someone to interact with the computer behind the domain. This is exactly how Namecoin works. The value is a lot more rich than DNS currently allows. It can register a user, an ip address, an ipv6 address, a Tor hidden service address, an i2p address, a Freenet address, an owner email address, geographic locations for the hosting, a DNS nameserver to forward the lookup to, a key-value store for the locations of subdomains, a fingerprint of a TLS certificate, and a set of round-robin IP addresses to route to for load balancing. These are not disjoint choices, most of these can be placed in the same Namecoin value body and the user can choose how to use the information. 

Actual interaction works currently by having the local Namecoin client act as a DNS server and as a CA authority. It will expose these to the machine, while pulling the necessary data from the blockchain.

TLS works fairly simply, where the name points to a TLS certificate fingerprint. The website on the other end of the network socket will serve up the certificate as part of the handshake, and it can be validated against the blockchain rather than against your government's database. 


Merged Mining


The strength of any blockchain-based technology depends entirely on the number of users the network has. When a blockchain has too few good users, it becomes cheap for an attacker to coordinate enough resources to rewrite history by altering previous blocks and outracing the rest of the network. If Namecoin had used a separate mining population for it's separate blockchain, it would have ran up against these problems. 

The solution is called merged mining. The size of a block and the difficulty of mining the block are a heuristic tradeoff between history immutability and throughput. Let's say that currently a powerful attacker can rewrite history quickly enough to change the last 3 blocks. If it tried the last four, it wouldn't be able to catch up before the network started making more blocks and made it intractable for the attacker. If the block size got larger or the work got easier, then an attacker could reverse more transactions. 

However, merged mining allows Namecoin to securely get the Bitcoin mining population to mine two blocks at once by making them heterogeneous. Rewriting one bitcoin block allows an attacker to rewrite one Namecoin block. Since the difficulty rates for information are similar as before, Namecoin can trust blocks that are old enough in the Bitcoin blockchain.

Whenever Namecoin has enough transactions to mine a block, it will insert the hash into a field of a bitcoin transaction that does not invalidate the transaction, and will have this transaction mined for a standard transaction fee. When this block is mined, the Bitcoin blockchain contains proof-of-work for the Namecoin block. The Namecoin blockchain will now insert this Bitcoin block into the Namecoin blockchain, dropping everything in the Merkle tree not necessary to validate the single mined transaction (to keep the Namecoin blockchain small). This serves as a proof-of-work for Namecoin.


Applicability of the Namecoin Blockchain for Other Applications


Namecoin offers features that may be attractive for other applications. Management of server and person identities is paramount in many computer systems. The promise of a simple, global, shared, auditable user identity system is attractive for many reasons. Namecoin requires a minor fee to register an account, making spamming or denial of service financially unreasonable for an attacker. In fact, we see nameID (openID provider) and bitMessage (chat) use Namecoin for just these purposes.

When thinking of storing more complex information in the blockchain, one must be weary. Namecoin has some performance problems which arise from the nature of blockchain consistency. Each record is limited to 520 bytes, which may not be enough for systems which want to store complex information. Likewise, it is neither low-latency nor free. A block is mined every 10 minutes, and requires a Namecoin transaction fee to be paid to miners. It isn't reasonable to use it as a distributed file system, and will not become a global SQL database any time soon. 

But there really isn't a problem with this. If you're trying to store all of the state of your application in the blockchain, you're doing something wrong. Namecoin is an example of a more general usage of blockchains which offers a lot of promise. Consider an RSS feed of weather monitor updates. Every time it wants to update the reading, it places this data in a data structure that is replicated via a DHT or over the Bittorrent network. Now you can store torrent hashes in Namecoin, or you can place torrent hashes in a data structure in the network, and you can put that hash in the network. 

Like most things in Computer Science, many problems with using blockchains as a component of a larger system can be solved with another level of indirection. 


Resources


https://nameid.org/
https://bitmessage.org/wiki/Main_Page
https://github.com/namecoin/namecoin-legacy/blob/namecoinq/DESIGN-namecoin.md
https://wiki.namecoin.org/index.php?title=Identity
https://wiki.namecoin.org/index.php?title=Domain_Name_Specification

Sunday, September 18, 2016

Bitcoin: A Sequence of Proofs


Imagine a lawless land where only other lands' money was trusted. Doing business with them was slow, often hazardous. Theft was possible, because you paid people by giving them a copy of the key (credit card information) to your personal gold vault. This was unsafe, but you didn't know how to do commerce any other way. The point of sale needed to be instant, and the sale was often so far from home.
Now imagine, a new casino appeared which was larger than most amusement parks and wanted a stable internal currency for the significant transactions. In this lawless land, they didn't want to have people walking around with currency that could be stolen. They wanted something like gold, but they didn't want gold. They gave people checkbooks and promised them there was gold somewhere behind the money. As long as people believed in the added value, it existed. The economy grew and it became more valued for it's use as a currency.
Eventually, the casino was made non-profit. It had a controlled rate of releasing this new currency and started letting anybody work for it at any time of the day, paying them this currency.
The bank worked in an odd way. The volunteers watched customers' messages come up and ask for transactions to be made. They'd collect a certain quantity of them and go to check that they all followed the rules. They follow a specific policy for how to throw out transactions that can't happen at the same time. These transactions are signed with the customer's secrets, so they are definitely from the listed customers. After they check it, they store the correct transactions in-order in a signed envelope (Merkle tree) . They pay themselves a little bit at the start of each envelope, but customers also pay a little extra in the transaction for the worker. The worker must now go to the strangest machine in the land. 
The bank distributed a machine which examines the signed envelope and spits out a custom lottery game. This game was very slow, and the poor volunteer has to go through the process of playing the lottery until he wins. 
But it gets worse.
If another lottery player collected a copy of that same quantity of messages, they could also start playing the lottery. And they're almost certainly going to be. They might start before or after, but it doesn't really matter. A million losses doesn't make the next lottery ticket any more likely to be a winner. Each event is independent.
If they win first, then their envelope goes into the casino's history ledger and they get the reward they paid themselves. All of your lottery playing is for nothing.
We should clarify that lottery playing doesn't make anybody money when a loss occurs. This lottery is entirely done for the people and by the people. 
Eventually, people decided to team up and share the rewards. Big clubs of lottery players formed which promised a pay out equivalent to the work done. People were happy, and they played on. After a day's pay, they took their coin home and spread it far throughout the economy. 
Ordering of events in a distributed system is a hard problem. It's one of the hard problems in Computer Science. Even if all of the packets are sent in the same order to all peers, having multiple sources of events means that events are doomed to be reordered. Some of the time, this doesn't matter. Most of the internet is optimized around the idea of allowing unrelated transactions to be reordered and buffered. Unfortunately, sometimes it is necessary to have the entire network agree on a linear history.

This is the case in systems where exposed actions can make resources inaccessible. An example is currency; a person is only able to spend funds for the small window in time that they own them. Blockchain technologies form an explicit linear ordering of synchronized operations enforced in a distributed manner.

The name comes from the fact that the envelopes from the story above are called "blocks" in Bitcoin. Transactions fill blocks, which have lotteries played over them through proof-of-work. This process is called "mining". The democratic mob votes on a history of transactions to accept by gambling at hash values.

While the first lottery writer might have won a reward from the casino in our parable, the reward only holds if it holds in the longest chain of history. Good nodes will reject the invalid transaction history in the mined transaction chunk (block), and will collectively play the odds better than the bad actor. They will eventually catch up. The good history will grow larger, and the bad actor will lose it all.

Economics


The value of bitcoin is set by supply and demand, like everything else in economics. Relative prices for goods between currencies should be stable, else people label it arbitrage and abuse it until it is fixed. This means that the value of bitcoin is a consensus had by people on how much utility there is in being able to spend bitcoins. 

If people are willing to exchange 50 chickens for a bitcoin, then a bitcoin is worth your currency's equivalent of 50 chickens. If bitcoins are difficult to find in your corner of the world, then they might be worth more. 

Threat Model


Bitcoin's only tractable attack is an attack on the integrity of the blockchain's linear history. Bitcoin is a set of rules for creating a data structure, the blockchain. The rules assume that whomever owns the private key associated with bitcoins is the owner of the bitcoins. When the customer wants to place a transaction, they need to sign the transaction. The ability to sign a transaction is a proof that one has the private key associated with the public key that the bitcoin was last sent to. The "last sent to" part is the reason that a linear history is necessary.

If you try to spend bitcoin that has already been spent, the mining nodes will reject your message. The only real practical attack on the blockchain is to spend bitcoins at a vendor, get the service after the transaction is in the blockchain, and then mine a block that has you sending the bitcoin elsewhere. This "history rewriting" requires the attacker to do the proof-of-work for all blocks after the rewritten one. Bitcoin specifies that the longest chain is the one that is correct. This means that the attacker needs to outrace the rest of the network in order to convince them to accept the evil fork.

The farther back in history, the more work that an attacker must do to catch up with the rest of the network and overcome them. This means that after a number of blocks have been mined, transactions are more or less unable to be altered. This gives the system a consensus on the linear ordering of transactions. Note though that integrity depends on the majority of the network choosing a chain that can be verified.

Cryptography Used


Bitcoin uses HashCash for proof of work. HashCash works much like the lottery metraphor above. One has a block of information to hash, including a nonce that is the miner's current lottery guess. The work difficulty is set by requiring the worker's nonce make the SHA256 hash lower than a given value. The output hash will vary in an unpredictable way as the nonce is incremented, making this search a pseudorandom Poisson process. Bitcoin has a rather difficult proof of work. Every two weeks, the Bitcoin network calibrates the current proof-of-work difficulty to reflect capacities. 

The other major use of cryptography in Bitcoin is the use of public key cryptography to handle a person's account balance. The transaction sends the currency to a person's public key. The private key associated with this public key then gains the power of that currency. The funds holder can sign transactions and smart contracts with this key to express consent. 

Sequence of Proofs


That brings us to a powerful alternate view of the blockchain that is the reason why it generalizes. It's not a key-value store of bank balances. It's a sequence of proofs. The existence of the completed proof-of-work is proof of the existence of the payout coins. Every future transaction with the coins refers to the transaction which created the coins being spent. It is the burden of the network to assert that you are referring to the latest state of that resource.

Consider instead a system of games of chess with betting. Each transaction "pays" the turn to the next payer, with alteration. Checking board rules would fall to the miners. You can prevent turn alteration this way.

Consider instead a blockchain that began with a number of facts and implications, along with implications such as Modus Ponens. From these, you could make each transaction identify itself as a usage of a reasoning rule, and it could refer back to the pieces of data which inhabit it. The proof checking would only have to check the latest transaction, as the assumptions of the proof being fulfilled are asserted to be true because they are older than it in the blockchain. Reasoned, accountable arguments could be carried out this way.

Scalability Problems


The previous chess network would be miserable to use in all likelihood. If the proof-of-work is sufficiently hard, then it should take minutes to get a move accepted by the blockchain. Interactive applications seeking to use the blockchain run up against many issues. Current bitcoin mining even runs up against this problem. Long transaction times result from the throughput limitations built into the bitcoin network by setting the difficulty.

This difficulty pushes miners to turn toward pools. As mining pools control a larger and larger fraction of the world's mining power, the bitcoin network suddenly becomes dependent on these mining pools being good actors. The distributed system of bitcoin becomes a decentralized system, compromising many of the initial promises.

This stems from the fact that proof-of-work is inherently costly. There are other ways to address the problems of double spending, which we will cover. Most of the problems related to bitcoin is due to incorrect usage of bitcoin. Bitcoin is designed to be a ledger which has the last word. It promises linear, strong consistency between transactions on the blockchain. Many applications are alright with eventual consistency. We will see that it is possible to conduct business off of the blockchain which can then have the state at the end of the session persist. 

Solutions


Proof of Stake And Tattling


A potential solution to the growing pains of Bitcoin is the use of proof-of-stake rather than proof-of-work. An attacker which has a stake in the history already on the blockchain is unlikely to jeopardize it. In proof-of-stake, the cryptocurrency is paid by the miners into the bets of the next block to win. If an attacker bets on multiple chains, then they're guaranteed to lose money. This, combined with the fact that buying a lot of currency is more expensive than a lot of computer power, makes proof-of-stake practical. We will cover Peercoin later, which does proof of stake and has other mitigations for certain attacks. 

An interesting idea is vote tattling. When an attacker votes on one block with a predecessor, and then votes on another with the same predecessor, peers can observe this. They can report double voting by using the votes as cryptographically-verified evidence, and taking the attacker's vote-money. 


Micro-Payments


Many times, entities want low-latency transactions over a short duration of time. Betting, bars, and moonlighting with frequent and small rewards are all examples. In order to achieve that, micropayment channels can be used. 

Bitcoin has the ability to represent fairly complex transactions, much more complex than simple money exchange. The transactions are written around a number of simple opcodes. One such opcode gives ability to hold locked funds until a certain timeframe has elapsed.  

Note also, that bitcoin transactions have their own meanings even before being entered in the blockcahin. These two facts allow people to set up a channel of payments. In a payment channel, the sender places a certain amount of money under time-lock. Until the time specified in the bitcoin transaction, the sender can't access their funds unless the receiver allows them to. Now the receiver can have the sender sign bitcoin transactions of increasingly larger value as the payments accumulate down the channel. 

None of these signed transactions are published until the channel is closed. When closed, the receiver publishes the latest signed transaction. It ends the time-lock and sends the confirmed funds to the receiver and refunds the remaining funds to the sender. If the receiver doesn't finish this handshake, and sacrifices their payments, then the transaction times out and the sender can do whatever they like with the locked funds. 

Monero


One issue that should be jumping out to the attentive reader is that bitcoin itself offers no anonymity. The public ledger has a timeline of all funds moved. People trying to replace other transaction mechanisms with bitcoin have to worry if the timing, quantities, frequency, and partners of bitcoin trades could give the world enough information to target them for abuse or persecution. This isn't the only problem; law enforcement might be able to convince miners to refuse to do business with your bitcoins. The Bitcoin system is not resilient to all kinds of attacks. 

Monero is to Bitcoin as Tor is to the public internet. It uses a ring signature to allow people trying to transact to use each other's input transactions instead. The interaction between people is randomized and mostly one-way, decreasing the possible attack surface. 

Monero fights the scalability effects on trust centralization due to mining pools in two ways. Monero doesn't allow for transaction fees to be set, instead making all blocks equally attractive to mine. This defeats a scaling issue which bitcoin will face when the reward for mining will come entirely from transaction fees. Secondly they obfuscate transactions to protect miners from effective legal demands. 

Related Resources

https://en.bitcoin.it/wiki/Main_Page
https://bitcoin.org/bitcoin.pdf
https://lightning.network/lightning-network-paper.pdf
https://getmonero.org/home
http://joel.mn/post/103546215249/the-blockchain-application-stack
https://21.co/learn/intro-to-micropayment-channels/#implementation-detailshttp://counterparty.io/platform/
https://gendal.me/2014/10/26/a-simple-explanation-of-bitcoin-sidechains/
http://weuse.cash/ring-signatures/

Friday, September 9, 2016

How the Invisible Internet Project (I2P) Works


I2P is an onion routing network very similar to Tor which seeks to address architectural decisions that many think make Tor an easier target for a powerful attacker. I2P doesn't use a centralized handful of Directory Servers, instead using distributed hash tables (DHTs hereafter) to coordinate state. The design of I2P is heavily distributed and router roles are much more homogenous than Tor. This is enabled by not allowing connections out of the network. All services in I2P are what Tor would call hidden services. Many would say that design makes it harder for an adversary to observe I2P participants and coerce them into leaving. Others would say that I2P's smaller size makes it easier for attackers to flood with evil peers. We will consider the utility of I2P's response to it's threat model.

Thursday, September 1, 2016

How Tor Works

Introduction


The Tor network is a public distributed network which uses cryptography to provide probabilistic anonymity. It establishes TCP streams from one end of a global cluster of untrusted proxies to the other, hiding traffic source information while allowing bidirectional network traffic over the stream. We will study how intelligent network architecture as well as cryptography allows for this real-life system to provide service in the face of coordinated attacks by very powerful adversaries.