Saturday, October 22, 2016

How BitTorrent Really Works



BitTorrent is both ambitious and simple. BitTorrent is a P2P protocol in which peers coordinate to distribute requested files. In order to resist downtime due to real-world seizure of computers, BitTorrent has had to progress to a fully distributed architecture, without any single point of failure. This is an impressive technical feat.

Even more impressive is that BitTorrent gets faster with additional content-fetchers, rather than slower. The classic economics of content distribution is suddenly inverted, rewarding high-desirability content.

It's no surprise then that BitTorrent is used nowadays for everything from sharing Linux ISO files to live broadcast streaming of sports and politics. BitTorrent's name is still controversial in many places because of its role as a subversive software. BitTorrent's power made it the first choice for piracy, which lead to many concluding that BitTorrent is only useful for piracy. While many ISPs and externally-administered networks attempt to block and trace BitTorrent, the fight has largely been lost.

By not placing restrictions on peers, BitTorrent opens itself up to a universe of attacks. Like other architectures, a combination of limited observability and sound mathematics is the solution. As we will see, the architecture prevents an evil actor from serving a corrupted file or causing undue load on the BitTorrent network.

Lastly, BitTorrent is forward-thinking. It contains an extension protocol that allows clients to design protocols that alter the behavior of peers, and enables peers to intelligently fall back upon the extensions supported by each. At the bottom of this is the basic peer protocol; ensuring that clients can agree on enough to simply serve the file if they share no extensions.

The Peer Protocol

When a peer wants to start sharing a file, they construct a metadata file that describes the attributes of the file as well as a number of options. BitTorrent uses bencoding for most data sent, which prefixes a data literal with a character describing its type and its length (if a string). The metadata will describe the files in the torrent, but also includes a SHA1 hash of each of the "pieces" or file fragments in the torrent. These fragments can be downloaded individually, allowing for streaming or for selective downloading.

The file's attributes are known as the "info" block and is what uniquely defines the torrent. The info block's hash is the torrent's unique identifier in the BitTorrent swarm of peers. This metadata file also announces a tracker that the torrent will be associated with. This is outside of the info block, to enable multiple tracker to track the same torrent and to have the same infohash.

This metadata file is half of what a downloader needs to know to download the file. The other half is the list of peers serving the torrent. Conventionally, a peer will query the torrent tracker for a list of peers serving the file. The distributed hash table, peer exchange, and local service discovery are all other methods. We will discuss the first later. The latter two can be thought of as "gossip" protocol extensions that allow peers to become known by the swarm. 

Now once a peer has a list of peers, and has connected to each of them over TCP (or the uTorrent transport protocol, not covered here), it now uses the peer protocol to fetch all of the files. These peer connections are bidirectional and have attributes set on them by either side. Peers will announce when they have finished downloading a piece, so that peers connected to them know whether they want anything from a certain peer.

A side may be interested, which means that they want "pieces" that the other peer has. A side may also be choking, which means that they're busy sharing with another peer. When a connection is both interested and unchoked, then data transmission happens. Peers will use "optimistic unchoking," or rotation of the choke list, to ensure that there is enough choke variability for the swarm to have a fair chance of progressing. Choking is done in order to limit the number of outbound TCP connections, to ensure that the communication switching overhead is low enough for a peer to be useful to those it is connected to.

Transmission only occurs when one side is interested and the other side is not choking. This enables peers to have a tit-for-tat where the peers which share the most freely are the ones which are able to access pieces the most rapidly. This localized enforcement of good behavior enables the network to scale upwards without collapsing. The hashing of all pieces sent ensure that no peers can "poison" the network by sending bad file fragments. This pervasive integrity checking was one of the things that allowed BitTorrent to succeed where its early competitors failed. 


DHT and Magnet Links

BitTorrent uses a DHT protocol to enable peer discovery without requiring communication with the centralized tracker. DHT "nodes" are not the same thing as torrent "peers," although a computer can be both. Nodes listen for DHT requests over UDP, while peers listen for the BitTorrent peer protocol over TCP. BitTorrent clients include a DHT node, which operates mostly as a querying "client" node.

The Kademlia-like DHT works by giving each DHT node an ID. IDs have a "closeness" metric that is computed by XORing two IDs together and interpreting the result as an unsigned integer. Nodes will know about other nodes which have a low XOR distance and will know about few nodes that have a high XOR distance.

A client makes a query for a torrent by using the hash of the metadata's info as an ID, and finding the node that it knows that is closest to the key. This node is then sent the request. If the node doesn't have the torrent, the node forwards the request to the node that is the closest to the ID that it knows. This process iteratively finds the node in the network that is the closest to the query's key. If the peer can't find a peer tracking the metadata's info hash, it will have to insert itself as the node responsible for the key into the DHT by introducing itself to the nodes closest to the key. 

In order to prevent a bad actor from registering other peers for torrents, the DHT includes a paper trail. The query will return a token. If the query fails, this token must be presented by the peer trying to register itself as responsible for the key. Tokens are only valid for approximately ten minutes.

A node constructs a list of peers by searching the DHT for nodes closest to its own node ID. Most of the peers the node knows will have nearly keys. A few exceptions will exist because of other observed nodes through other network operations. In this way, nodes tend to construct a very "localized" view of the network that makes it faster to narrow in on a single key quickly. 

Now this DHT is quite simple, it associates peers with a torrent. In order to get the querying node enough information to begin downloading the torrent, the network must serve the torrent's metadata file as well. This newer feature is referred to as a "magnet" link, and enables a peer to complete an entire download using only the node ID. Trackers using magnet links need only provide links with hashes, rather than indexing and serving a collection of torrent metadata files.

Private Torrents

BitTorrent is a very inclusive network in its default state. The Distributed Hash Table, peer exchange, and local service discovery can all help a peer find other peers without having to rely on the tracker. While this is typically a good thing, it can be undesirable to remove the ability to control file distribution centrally.

One frequent use case for centralized control is to enforce a share ratio. When a community shares many files, it becomes important to enforce that peers upload an acceptable ratio of data to what they download.

This fine-grained access control is done by restricting peer information querying to those peers that the private tracker decides should be able to download the file. This is really security through obscurity.
Once an intruder peer has obtained the IP address and port of a peer, regardless of the source, the intruder can initiate a connection to that peer and trade pieces with the peer. Once in the swarm, the intruder is granted equal treatment as all other peers. 
Source: http://www.bittorrent.org/beps/bep_0027.html

What happens is that the torrent's metadata file includes a "private" flag that tells peers to only use the tracker to exchange peers. Furthermore, a client may only use the peers from a single private tracker at a given time for a given torrent. This prevents a peer from uploading a private metadata file to a public tracker and having the existing swarm serve the file.


Future

We're seeing a lot of new innovation in the BitTorrent sphere recently.

BitTorrent's power is proportional to its convenience. The associations with piracy have led to BitTorrent software that requires that one jump through a lot of hoops in order to use the software sometimes. You can only download from BitTorrent what peers are seeding through BitTorrent. Convenience is largely the reason that many consumers have abandoned piracy torrents in exchange for services like Netflix and Spotify.

PopcornTime (https://popcorntime.sh/) is an example of the power of BitTorrent with good content indexing and a nice UI. While it is plagued by the fact that it is a piracy-focused tool, PopcornTime represents a fairly flexible CDN system for popular media. One can imagine clones for open-access media and for news. What's worth noting is that the cost of running PopcornTime is absolutely minuscule for the software creators, especially compared to the behemoth infrastructure of Netflix. With good copy protection on media, BitTorrent may one day be seen as the powerful content-agnostic technology that it is.

BitTorrent is useful for a lot more than shuffling around large files though. WebTorrent uses the newer webrtc browser features which enable P2P data channels. Browsers are now able to act as peers, making the ecosystem much more flexible. It's worth noting that WebTorrent peers are not compatible with BitTorrent peers due to the use of webrtc for transport, rather than naked tcp sockets. As well as serving large media files, torrents can now serve static websites. Peers can send messages and media between each other in realtime chats without relying on a centralized server. With enough adopters, webTorrent may invert the economics of web hosting. Popular sites will be cheaper to host than less popular sites, without having to fall back on advertisement networks.

The web2web (https://github.com/elendirx/web2web) project uses bitcoin and BitTorrent together to replace the entire web stack. The blockchain is searched for the last outgoing transaction from a given address. This transaction uses the OP_RETURN bitcoin opcode to define the transaction as invalid, and stashes the torrent infohash in the transaction's body. This hash is enough to fetch the website being served through WebTorrent.

CacheP2P (http://www.cachep2p.com/) uses WebTorrent to offer a website cache. The reasoning is that popular content is likely to have a nearby peer which is closer than the nearest CDN server. CacheP2P, it's worth noting, costs the website maintainer nothing and costs each of the site's visitors a very small amount of traffic. This stands in contrast to the expensive caching infrastructure necessary to blanket a country or the globe. If a site becomes more popular in a certain area, CacheP2P will bring more cache servers to the region automatically. This latter point is a good enough argument alone.


Resources

http://www.bittorrent.org/beps/bep_0003.html
http://www.bittorrent.org/beps/bep_0004.html
http://www.bittorrent.org/beps/bep_0005.html
http://www.bittorrent.org/beps/bep_0006.html
http://www.bittorrent.org/beps/bep_0009.html
http://www.bittorrent.org/beps/bep_0023.html
http://www.bittorrent.org/beps/bep_0027.html
https://github.com/elendirx/web2web
https://webtorrent.io/
https://popcorntime.sh/en

13 comments:

  1. Bittorrent is going to get a lot of innocent people thrown in jail once hackers figure out that the justice system in America is stupid enough to not know how torrenting works with regards to kiddy porn.

    ReplyDelete
    Replies
    1. In other words, said hacker (say from russia or china) hacks your PC, opens up a background torrenting connection downloading and streaming kiddy porn, FBI or whoever gets tipped off about said kiddy porn, and then traces the streaming IP to your house where your PC has 101 local copies of said offending item, then you are facing life in prison in some states. (like oklahoma)

      I have a friend that is having this exact thing happen to him because he was stupid enough to clean other people's windows PC's of viruses and connecting them to the internet. The pc had run at startup torrenting software running in the background and it was streaming out child pornography from his IP without his knowledge.

      Imagine if this happened to 1% of all the people compromised in botnets. The motivation is there certainly.

      Delete
    2. In other words, said hacker (say from russia or china) hacks your PC, opens up a background torrenting connection downloading and streaming kiddy porn, FBI or whoever gets tipped off about said kiddy porn, and then traces the streaming IP to your house where your PC has 101 local copies of said offending item, then you are facing life in prison in some states. (like oklahoma)

      I have a friend that is having this exact thing happen to him because he was stupid enough to clean other people's windows PC's of viruses and connecting them to the internet. The pc had run at startup torrenting software running in the background and it was streaming out child pornography from his IP without his knowledge.

      Imagine if this happened to 1% of all the people compromised in botnets. The motivation is there certainly.

      Delete
  2. I have a small doubt. When a peer(say A) wants to download a file which is present with 4 other peers (say B, C, D, E). And the file has 32 "pieces" or hashes. How does A ask for fragments of file from B, C, D and E without repeating the same fragment request to different peers ? For example, It might be possible that A is asking for piece number 15 , and both D and E are sending that fragment. How is this avoided ?

    ReplyDelete
    Replies
    1. It's generally solved with a handshake where the client decides from whom it's going to accept the offered pieces.

      Some client might actually download redundant data during this step, to gauge the possible upload rates of the different peers.

      Delete
    2. Oh meaning during the handshake, the client will mention a list of pieces it wishes to download from each peer. So in way, A already decides which pieces to ask for from each peer. Right? Something like A decides pieces 1,3,5,7 from B and 2,4,6,8 from C. And then it initiates request to the peers.

      Delete
  3. The article that you had shared with us was really very interesting. I read your complete article again & again and found that something is missing. To make it more interesting or easily understandable for end user you should cover this topic also. Just have a look to this What is Torrenting, you will definitely got something new.

    ReplyDelete
  4. Actually this was an awesome blog post. You have a very inspiring way of exploring and sharing your thoughts. I enjoyed reading your stuff.  hank you. Xmodgames App
    Xmodgames android features

    ReplyDelete
  5. I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post. Hats off to you! The information that you have provided is very helpful.

    Virtual Edge

    ReplyDelete
  6. Thanks for sharing your honest experience. When I first took a look at my head shots,
    I wasn’t too thrilled with mine but you’ve given me a new perspective!

    Virtual Edge

    ReplyDelete
  7. So you are suggested to scan every file that you download for viruses, download the movie released by trusted sources in the community and check comments and ratings to see whether others suffer from virus attacks with the torrents. NordVPN Torrenting

    ReplyDelete
  8. Apple finally released iOS 11 beta 4 download profile for iPhone and iPad developers. The iOS 11 update is available to download. There is no need to register an iOS UDID to run the latest firmware.

    ReplyDelete
  9. We know Apple is preparing next-generation iPhone 8, and it will be running all-new iOS 11 firmware.

    ReplyDelete