The Tor network is a public distributed network which uses cryptography to provide probabilistic anonymity. It establishes TCP streams from one end of a global cluster of untrusted proxies to the other, hiding traffic source information while allowing bidirectional network traffic over the stream. We will study how intelligent network architecture as well as cryptography allows for this real-life system to provide service in the face of coordinated attacks by very powerful adversaries.
Threat Model and Goals
Tor differs from many principled distributed architectures in that the primary concern is usability. Rather than making promises about the security of the system and accepting usability restrictions, Tor promises that it can be used for Web traffic and interactive applications. As we will see later, this informs the decision to accept a certain level of vulnerability to timing attacks.
At first, Tor's goal was to allow for privacy of traffic. It should be unfeasible for an eavesdropper or hostile network member to uniquely identify the source of a given TCP packet that leaves the network. As many people who require privacy live in countries where the network is blocked, state-level attackers comprise many of Tor's series adversaries. This expanded Tor's threat model to include resistance to censorship from an adversary capable of enumerating many machines and completely controlling traffic within an IP range. This has motivated changes in relay discovery since the initial Tor architecture.
Onion Routing 101
Imagine that you wanted to get a packet from Alice's computer to Bob's web server. Let's say that Alice lives in a country where people who shop for pressure cookers and backpacks get raids from federal agents. Now Bob sells a lot of household items and he unfortunately his ISP has been compelled by the government to give them all of his traffic. Alice doesn't want to get raided because her neighbors are light sleepers. How can we get Alice's packet to Bob without giving the government somebody to intimidate, pay, or torture to get the information?
So the first idea is for Alice to have a friend carry the packets along for her. She sets up a TLS connection with a web proxy on her friend's server. This friend lets anybody on the internet use it, so the government has no idea which of the 10k citizens are shopping for things on the naughty list. This has some terrible properties though.
This server is now "trusted." This isn't "trusted" in the sense that it's trustworthy, but "trusted" in that you must trust this relay with the entirety of your security. If this server is run in Alice's country, or her country is an ally of the country the server runs in, her friend running the server might get raided. Friends don't get friends raided. Also, the issue of timing and packet size comes into play. If an attacker watches all of the traffic coming in and out of the proxy, and if Alice is the only one sending a packet every 50ms with a 2kb payload, then the attacker can trivially associate Alice's input with the proxy's output.
What if you were to chain together multiple proxies? You have the same problems without cryptography. Every proxy is potentially the weakest link in the chain. You have a weaker, not a stronger system. Now let's give every proxy a public/private keypair. You could encrypt the message for the last proxy, and now all of the other proxies can't read it. Furthermore, the last proxy only knows the address of the server sending the message to it. If the attacker shows up at their door, they can explain that they're unable to help without lying. The issue now is that all of the proxies before the last in the chain know the address of the last proxy. This means that the first proxy in the chain sees both Alice's IP address and the IP address of the last proxy; the first proxy can tell the attacker everything it needs to know.
What you do is that you encrypt the message with the public key for the last proxy first. Then you create a message for the next-to-last server saying "send this encrypted message to the proxy with this IP". You encrypt this message with the public key for the next to last proxy. You do this for each step in the chain until you reach the first proxy. You can encrypt with the public key for this relay, but Tor doesn't because Tor connects Alice to the first proxy with TLS. TLS is already secured using public key cryptography.
Now we've discussed that public key cryptography is slow. If you were to use it for all of the packets as we described, you'd have a very slow network. Also, a government would be able to go to each proxy server owner and make them decrypt the given packet because the proxy's public/private keypair remains the same. Instead, we can use public key cryptography to agree on a symmetric encryption key for each server. Now we still encrypt the packet in layers from last relay to first relay, but we use a temporary key that is fast to use. At the end of every hour or so, all of these proxies must throw away their keys. It's typically outside of an attacker's power to seize many servers spanning many countries within an hour. This is called perfect forward secrecy because a security breach in the future doesn't jeopardize the security of past traffic.
This doesn't prevent the timing attack which we spoke about. The Tor network could work around this problem by collecting all of the messages to send in each 10-minute period, add padding on the end of each message to get a uniform size, and shuffle them all around. The Tor project chose not to do this because it would make the system unsuitable for interactive web browsing. Instead, the Tor network has decided that this is a failure state. The Tor project has a number of strategies to make it unlikely for a single attacker to be able to observe enough of the exit relays and enough of the entry relays into the network to correlate messages via size and timing. These don't always work. Later we will cover the most recent and most successful attack on the Tor network.
Circuits and Connection Pooling
Onion routing as it had existed before Tor would simply create a new circuit for each TCP connection that the user needed. The issue with this is that some web traffic patterns can create and dispose of hundreds of connections regularly. In order to achieve the promise of usability, Tor has an interesting connection pooling architecture that tries to reuse routes through the network (called circuits).
When a client tries to connect to the network, they will first get a list of the relays they can use. From this, they will choose an exit relay, a guard (entry) relay, and one intermediate relay. Tor will make sure that these relays are geographically far apart, and are not hosted by the same people (self-reported). We will later explain the differences in relay types. We only choose 3 hops because adding more hops doesn't really add security. Each relay can only know the predecessor and successor, so 3 hops will provide the required amount of isolation. Furthermore circuits with more hops must go through more servers, increasing the likelihood that an attacker's server is included in this chain.
Once a client has chosen the relays to use, it will connect to the guard relay over TLS. Since TLS uses secure, fast communication with good key negotiation, there is no other crypto used between the client and the guard server. The client will then send "Cells" (tor protocol packets) which tell the guard relay to connect to the intermediate relay. Cells can be commands for relays, or can be data payload packets. Further cells are sent to the intermediate relay in order to carry out a Diffie-Hellman key negotiation. After this is done, the intermediate relay is likewise commanded to extend the circuit to the exit relay.
It's worth noting that it is important to use integrity checking on these data streams. While an attacker can't read the messages, if an attacker could guess the plaintext then they could find out the stream of bits that were used to encrypt this specific packet. Future packets are not compromised by this, but the attacker gains the ability to modify the message. Certain steps in the protocol can be modified, and the network can have protocol invariants broken.
TCP packets which are sent through Tor after this circuit is built will move through these connections, encrypted symmetrically for each relay. At the exit relay, the final plaintext message is visible. Since circuits are shared for different TCP connections, it is the duty of the exit relay to make sure that the packet is put into the correct TCP stream. This works in most cases, but puts the burden of traffic cleanliness on the user. A classic example was the issue of BitTorrent.
BitTorrent will embed the IP address of the user in the data packets. This means that anybody seeing a BitTorrent packet leave the Tor network can associate it with the client on the other end. Even worse, an attacker can associate this exit relay with the circuit which is tunneling all of the client's TCP streams. Because of attacks like this, Tor now will use a new circuit for certain kinds of traffic such as building a circuit, sending a message, hosting a hidden service, or joining as a relay or exit relay. BitTorrent is still bad to run over Tor, but most Tor clients can try to prevent it from sharing circuits with Web traffic.
This is where Tor's directory system comes into play. The directory system is a collection of router descriptors and states. The directory system of Tor is actually not distributed. It's decentralized to a degree. There are currently eight directory authorities for the deployed Tor cluster, but it managed to operate with only three for the longest time. These authorities have their public keys and hard-coded into the software distribution of Tor. That reliance upon this system of a handful servers has remained cryptographically protected against nation-states for so long is quite a surprise.
This system has gone through three changes. At first, these servers were fully trusted. Clients would connect to all three servers and would get the state of the cluster from the most recent one spoken to. They got this document directly and it was sent without encryption. When authorities disagreed, clients would get different documents. Even worse, an attacker could tell which document each client got. They knew which relays clients could use. Also, there was no consensus between all three servers that was required. Lastly, each server was subject to the load of the entire cluster. This fundamentally threatened the ability of Tor to scale.
- The first major change was to separate the data into a document of relay attributes and a document of relay states. The latter could be used to keep a consistent view of the subset of the cluster that is available both during the state fetch and the last descriptor fetch. This state document was much cheaper to send.
- The second change was to prevent rogue authorities from controlling a client's view of the network. This was done by fetching a more complex document from each authority which represented their view of the network. This allowed clients to find the intersection of trust between the authorities, potentially partitioning clients based on document variation throughout time. Since the documents were sent unencrypted, fingerprinting was feasible. Attacks in this vein could allow an attacker to eventually narrow down traffic sources. This moved more load from the authorities to the clients. Connecting to the network now required fetching a few megabytes of descriptors.
- The last major change was to explicitly create a consensus document. The authorities will find a consensus view of the network once per hour using an ad-hoc consensus algorithm. They will all sign this view of the network and will distribute it for the next hour. This document used a much compressed format of router descriptors, saving space. This gives all clients a consistent view of the network and decreases the footprint of the directory documents. Clients will now sometimes create a one-hop Tor connection to fetch the next document. This prevents an attacker from fingerprinting documents and profiling which clients see which network states propagate.
How does a relay join the Tor cluster?
What's interesting about the Tor cluster is that bandwidth allocation is integrated into it. A non-exit relay starts trying to join the cluster by forming four circuits to itself and sending a benchmark self-test to measure the bandwidth it can carry. This it put into a router descriptor and signed and is offered to the directory authorities. Here we see that the directory authorities are acting as public key distribution agents much like CA authorities do in the HTTP world.
At first, the cluster doesn't trust you. It won't really route much traffic (if any) through you. This routing is done by by reporting your bandwidth availability in the consensus document. Clients will choose you with a probability proportional to your bandwidth availability, and so few clients will choose you for a circuit. All the while though, relays in the network will measure you. If they find you fast and stable, they'll slowly allocate more and more traffic to you. As your traffic slice increases, your circuit speeds will be measured to find out what you can really provide.
You'll get more traffic but it will be limited still because you won't be a guard relay. If every circuit is new then the chance of not picking an attacker's relays for the first and last hops goes to 0 over enough time. If you remain with the first hop over time, then you either choose an attacker and lose any time your other relays are attacker relays or you choose a good relay and all of your paths are safe. This stable first relay must be reliable or the performance will be terrible. This relay is a guard relay. This is essentially a way to make it more probable to be safe.
For you to become a guard relay you must be stable for 8 days. After you're a guard relay you'll be moved up into a separate class of relay which is rotated through by clients over time. Eventually the number of clients that cycle out of you will equal the number which cycle in. You will have your bandwidth mostly saturated at this point.
If you were an exit relay, you would see traffic leaving you with speed once your bandwidth allocation was being scaled up.
I find the heterogeneous nature of Tor quite interesting. There is the decentralized cluster state authority system which uses a small number of trusted people to verify the truth of network observations. The actual routing policy partitions relays into where they will be on the paths through the relays. In each and every step, a relay is only allowed to perform the steps that we expect it to achieve from past behavior. Throughout all of this, we see that a fungible resource is used to make attacking Tor from the inside expensive. This resource is bandwidth over time; this resource is identity.
Attackers tend to use anonymity to make attacks cheap to carry out. A computer can participate in a denial of service attack or inject traffic frequently by changing IP address. We do not put a relay in a position to degrade the network until it has behaved well for a long time. After placing it there, we constantly observe it. When it behaves badly, we eject it from the network. We see that the cost of attacking Tor is the maintenance and loss of a cryptographically-verified trustworthy identity which must be maintained at the cost of network bandwidth. By assuming that every network member is anonymous, the Tor protocol intelligently creates an economy of trust and makes untrustworthiness expensive.