Friday, September 9, 2016

How the Invisible Internet Project (I2P) Works

I2P is an onion routing network very similar to Tor which seeks to address architectural decisions that many think make Tor an easier target for a powerful attacker. I2P doesn't use a centralized handful of Directory Servers, instead using distributed hash tables (DHTs hereafter) to coordinate state. The design of I2P is heavily distributed and router roles are much more homogenous than Tor. This is enabled by not allowing connections out of the network. All services in I2P are what Tor would call hidden services. Many would say that design makes it harder for an adversary to observe I2P participants and coerce them into leaving. Others would say that I2P's smaller size makes it easier for attackers to flood with evil peers. We will consider the utility of I2P's response to it's threat model.

Threat Model

I2P's network is small, much smaller than Tor. Despite designing to scale to millions of users, it suffers from the privacy issues of small anonymity networks. Many attacks which should be impossible are tractable to a state-level attacker. 

If I2P is deemed to be illegal in a country, a sufficiently motivated attacker may make efforts to identify I2P users and summarily jail them. As all routers are stored globally in the DHT, this is a more realistic attack than one would think. The built-in rate limiting in the DHT system makes this less effective, but it is an attack nonetheless. 

It's not that easy to simply observe I2P traffic and to know that it's I2P traffic. Every byte is encrypted or part of the handshake, making it harder to isolate from other encrypted TCP. The use of randomized ports across TCP and UDP also makes it harder to fingerprint. 

An active adversary with passive widespread surveillance is an entirely different matter though. An ISP colluding with an agent connected to the network could have the agent send a peer data of certain sizes. Over time, peers sending less data can be ruled out until only the recipient is left. This attack is practical in a small network like I2P. Similarly, timing attacks can apply. They're much harder in I2P though, due to queuing, message batching, and throttling. This kind of attacker is rare, as attacks can be quite costly since they tend to require global attacks in order to isolate single peers.

More reasonable is for an attacker to take on the entire network by creating many evil nodes. Eventually, a good portion of the network will route through only the attacker's nodes. As the network grows, this becomes more difficult. This is known as a Sybil attack. There are token protections against this type of attack. For instance, peers from the same IP range won't be in the same tunnel. This is not even remotely sufficient though. If an attacker controls 20% of the peers in the network, they can refuse to route messages unless predecessors and successors are both bad nodes. This will force a tunnel with only the attacker's nodes, or will prevent the peer from using I2P at all.

This brings up a good point. An attacker doesn't need to break into I2P if they can make it unusable to their target group. By applying load to the network, I2P's latency can be increased until clients will be forced to abandon it. This is an effective attack. I2P has safeguards and throttling at many layers, but a motivated adversary could still degrade the system with enough evil peers. 

Lastly, the distributed nature of I2P's DHTs make them a target for attack. The fact that a router advertising a public key will sign their information with it makes it very difficult to insert information which has much of an effect. The DHT-serving routers have rate limiting, and there are many of them. This makes a denial of service attack much harder, since peers can route around nodes under stress. Stressing the entire DHT system is expensive, but doable. The current mitigations are considered imperfect but competitive.

Tunnel Creation

The routing differences between Tor and I2P stem primarily from the fact that I2P doesn't connect to the public internet by default. Tunnels in I2P work slightly similarly to how hidden services work in Tor, but are faster because I2P was designed with hidden services as the primary use. It comes down in part to the fact that a Tor Hidden Service requires more initial connection coordination and requires more intermediate relays to send the message along. 

Tunnels are much shorter-lived in I2P by design than in Tor. Whereas Tor will multiplex messages through a long-lived circuit, I2P's tunnel policy makes this tunnel conservation less useful. Tunnels in I2P are unidirectional; every client has a different set of routers for outgoing traffic than they do for incoming traffic. This partitions the nodes along the tunnel into roles which each behave differently. This is quite similar to Tor's Guard/Intermediate/Exit relay system works, with some important differences. 

In the tunnel pipeline, the "Gateway", "Participant" and "Endpoint" router roles correspond to the Guard, Intermediate, and Exit relay roles respectively. There are 0-hop, 1-hop, 2-hop, and 3-hop tunnels. Tor's standard architecture is to use 3-hop tunnels, and I2P recommends 3-hop tunnels for maximum effective security. We explained why this policy makes sense in the Tor post.

The Gateway is responsible for receiving a number of messages and batching them up into a datum called a "garlic message." Garlic messages will be encrypted by the Gateway and passed along to the first Participant. The Participants know very little, and simply removes a layer of encryption to read the forwarding address and passes the message along to the next Participant or the Endpoint. The Endpoint must decrypt the garlic message and figure out what to do with it. We see above that Alice's Endpoint talks to Bob's Gateway. The role of Gateway/Participant/Endpoint in the pipeline is fixed, Alice's Endpoint doesn't talk to Bob's Endpoint.

Note that Alice doesn't necessarily need to choose the same Outbound Gateway every time. I2P has not chosen to use Guard nodes as Tor has. This makes it more likely for an attacker which can run a Sybil attack to force a peer to rapidly recreate tunnels and eventually route through only attacker nodes. On the other hand, Tor's Guard nodes have a recent history of poisoning []. The fact that you're choosing from a smaller pool of relays for Guards can make it easier for a patient attacker to control a fraction of the Guards.

Also note that there is no single computer which appears to be the source of all of the traffic. In Tor, the Exit Node will appear to the outside world to be the source of all traffic it routes outbound. This requires heroic volunteers willing to defend themselves from government agencies who observe illegal traffic. Rather than getting a large pool of people who hope that "I'm running Tor" is enough to avoid culpability for exit node traffic, I2P makes it so that only Bob can see the traffic destined for Bob. On the other hand, Bob must be running an I2P service for this to work. By requiring that all participants hosting services do so through I2P, the legal risk associated with relays of the network are decreased.

This latter point is why I2P tends to have higher peer participation than Tor. Tor has relays and Tor has clients, and only a few committed souls are both. Most I2P clients with reliable internet connections become routers for other peers. By making it less costly to route traffic, I2P ensures that the routing capabilities will scale with the network. Contrast this to the problems that the Tor network would face if the relay set remained the same but the number of clients were to increase tenfold. The directory service and the onion routing of Tor would face traffic saturation. This also makes it harder for an adversary to ban all I2P routers. Tor's relays are more fixed than I2P's, requiring more obfuscation and centralized secret accumulation (bridge relay list) to avoid state-level attackers from banning them. 

Onion Routing Transit Cryptography

Unlike circuits in the Tor network, I2P tunnels are much shorter lived. Furthermore the use of separate tunnels for incoming and outgoing traffic makes it difficult for a peer to associate a key with a specific message. This makes the Tor policy of negotiating an AES shared key with each relay in the circuit slightly too simplistic for I2P. Instead, I2P uses an interesting system of explicitly-managed key reuse. 

When an I2P router wants to coordinate communication with a router, the sending router will use the asymmetric cryptosystem ElGamal (a Diffie-Hellman variant) to encrypt an AES256 key (called the SessionKey) for use in future communication streams. Rather than stopping here, the sending router will also create a number of SessionTags and encrypt them in the message body. Each SessionTag is a 32-bit random nonce, which is an arbitrary number that can be associated with at most one message. When the receiving router gets this message, it will store the AES256 key in a key/value system with the SessionTags as keys. Since SessionTags can only be used once, routers must re-negotiate a new SessionKey after all SessionTags are used up.

When future messages are received, the router will check if the first 32 bytes are a SessionTag. If so, then the router will use the SessionKey associated with that SessionTag to decrypt the message. This removes the SessionTag from the router's collection. This makes it impossible for an adversary to notice a reused SessionTag, and infer that the same two routers are responsible for these messages. If the first 32 bytes are not, then the router tries to decrypt the message using asymmetric encryption. If the decrypted payload has a correct checksum then the SessionTags and SessionKey are taken from the message body. Otherwise, it is refused.

In order for the sending router to know that the SessionTag handshake completed correctly, it will frequently bundle information for the inbound tunnel into the message. The router can then send back an acknowledgement of correct communication. It's worth repeating one more time that I2P tunnels are unidirectional, unlike Tor, so these SessionKeys are used for communication in only one direction. This limits the ability for an adversary to compromise communication in one direction and get the entire message stream.

The Distributed Network Database

Rather than rely on a centralized set of directory servers, like Tor, I2P uses two distributed hash tables to coordinate the state of the network. This offers added scalability. There are 10 Tor directory servers at the time of writing. (Find them at These are trusted servers; the integrity of the entire Tor consensus rests on them being good actors. In contrast, I2P puts the data in the hands of everybody and the trust in the hands of nobody. 

As we will later get to, all routers in I2P have their traffic rates measured in a trustworthy way. It's not entirely self-reported. This means that we have a reliable way to figure out which peers are the fastest and most accessible to a given router. These routers are called "floodfill" routers and are used to carry the Network Database. 

In a recurring theme in these systems, we gain resilience against compromise through public key cryptography. A router publishes a public key and a list of attributes associated with the itself, and signs the message cryptographically.  When a router wants to insert data into NetDB, it picks a floodfill router and sends it the data. It waits 10 seconds and then picks another floodfill router and requests the information. If it can't find the information, then it repeats this. Eventually, the network accepts the insert. 

Insertion works by doing an XOR with the hash of the key and each floodfill router hash to find the value of the distance. The closest floodfill routers will get the value pushed to them. This redundancy builds peer loss tolerance into the DHT. Because there is a redundancy of 8x currently, and because the network is small currently, we have a O(1) rather than a O(log n) query. 

I2P has two DHTs required to build tunnels.  There are the RouterInfo and the LeaseSet, both of which use the SHA256 of a router's "identity" public key as the key. The RouterInfo table describes the attributes of the peer's I2P router which is connected to the network. The router's identity public key, IP address, capabilities, and a hash of the above are inserted into the network. LeaseSets describe the inbound router to use as the starting point (Gateway router below) for the tunnel. It also describes a key to use when talking to that router to uniquely identify the particular tunnel that is used for inbound connections. RouterInfos do not expire by default, peers will manage a local cache as per client preferences. LeaseSets do expire because inbound tunnels expire. 

Peer Profiling

If a peer could lie about their performance in the RouterInfo, then an adversary could force the network to route a lot of traffic through a few slow nodes. This would make the network unusable, a very effective attack. RouterInfo only allows peers to say "I'm too slow to be useful." I2P, like Tor, will measure the performance of peers so as to accurately route traffic between them. Over time, the capacity and speed will be found.

This allows peers to be sorted into three groups: high capacity, fast, and standard. High capacity routers have a capacity higher than their peers. A fast router is a high capacity router which is faster than the median speed of the router's peers. Standard routers are the remaining routers.

Peers snoop on each other and create profiles on their tunnel drop rate, their NetDB response latency, and other attributes. A router saves their peers' profiles to disk on shutdown, to make rejoining the network easier. Clients are expected to determine eviction rate of profiles. 

Peer Selection

Peers are selected from the "fast" group when the router is going to be used to communicate with a hosted service on another peer. Peers are selected from the standard group randomly for tasks that are acceptable to have low bandwidth, such as DHT lookups and client tunnel test packets. These "exploratory" tunnels will fall back on fast routers when the standard routers begin to provide too much latency. 

The I2P architecture dictates certain important restrictions on the peers that can be chosen when creating a tunnel in the name of security. The following are forbidden:
  • Two peers from the same /16 IP space may not be in the same tunnel.
  • A peer may participate in a maximum of 33% of all tunnels created by the router.
  • Peers with extremely low bandwidth are not used.
  • Peers for which a recent connection attempt failed are not used.
It's also worth noting that an ordering is applied to all peers in order to avoid the Predecessor attack [] . That is, if node A comes before node B in a tunnel then node A will always come before B in all tunnels.


We've looked at a second variation on the classic design of onion routing. I2P is an interesting architecture because of the aggressively decentralized nature of it. Going after I2P peers offers little benefit, as routers are fairly homogenous. Likewise the lack of a connection to the public internet means that no exit node is required. There is no peer both listening to and transmitting the plaintext of this outbound traffic for sites to point a finger at. This more insular network containing only hidden services looks like a solution which can scale indefinitely as nodes are added. Some of the algorithms related to peer manipulation and search may need to be improved at some point, but these changes would not require major protocol revision. If the world changed in such a way that secure communication was more important to most than access to mainstream social networking, I2P looks like a fine network to take on the challenge. 


No comments:

Post a Comment