BitTorrent is both ambitious and simple. BitTorrent is a P2P protocol in which peers coordinate to distribute requested files. In order to resist downtime due to real-world seizure of computers, BitTorrent has had to progress to a fully distributed architecture, without any single point of failure. This is an impressive technical feat.
Even more impressive is that BitTorrent gets faster with additional content-fetchers, rather than slower. The classic economics of content distribution is suddenly inverted, rewarding high-desirability content.
It's no surprise then that BitTorrent is used nowadays for everything from sharing Linux ISO files to live broadcast streaming of sports and politics. BitTorrent's name is still controversial in many places because of its role as a subversive software. BitTorrent's power made it the first choice for piracy, which lead to many concluding that BitTorrent is only useful for piracy. While many ISPs and externally-administered networks attempt to block and trace BitTorrent, the fight has largely been lost.
By not placing restrictions on peers, BitTorrent opens itself up to a universe of attacks. Like other architectures, a combination of limited observability and sound mathematics is the solution. As we will see, the architecture prevents an evil actor from serving a corrupted file or causing undue load on the BitTorrent network.
Lastly, BitTorrent is forward-thinking. It contains an extension protocol that allows clients to design protocols that alter the behavior of peers, and enables peers to intelligently fall back upon the extensions supported by each. At the bottom of this is the basic peer protocol; ensuring that clients can agree on enough to simply serve the file if they share no extensions.
The Peer Protocol
When a peer wants to start sharing a file, they construct a metadata file that describes the attributes of the file as well as a number of options. BitTorrent uses bencoding for most data sent, which prefixes a data literal with a character describing its type and its length (if a string). The metadata will describe the files in the torrent, but also includes a SHA1 hash of each of the "pieces" or file fragments in the torrent. These fragments can be downloaded individually, allowing for streaming or for selective downloading.
The file's attributes are known as the "info" block and is what uniquely defines the torrent. The info block's hash is the torrent's unique identifier in the BitTorrent swarm of peers. This metadata file also announces a tracker that the torrent will be associated with. This is outside of the info block, to enable multiple tracker to track the same torrent and to have the same infohash.
This metadata file is half of what a downloader needs to know to download the file. The other half is the list of peers serving the torrent. Conventionally, a peer will query the torrent tracker for a list of peers serving the file. The distributed hash table, peer exchange, and local service discovery are all other methods. We will discuss the first later. The latter two can be thought of as "gossip" protocol extensions that allow peers to become known by the swarm.
Now once a peer has a list of peers, and has connected to each of them over TCP (or the uTorrent transport protocol, not covered here), it now uses the peer protocol to fetch all of the files. These peer connections are bidirectional and have attributes set on them by either side. Peers will announce when they have finished downloading a piece, so that peers connected to them know whether they want anything from a certain peer.
A side may be interested, which means that they want "pieces" that the other peer has. A side may also be choking, which means that they're busy sharing with another peer. When a connection is both interested and unchoked, then data transmission happens. Peers will use "optimistic unchoking," or rotation of the choke list, to ensure that there is enough choke variability for the swarm to have a fair chance of progressing. Choking is done in order to limit the number of outbound TCP connections, to ensure that the communication switching overhead is low enough for a peer to be useful to those it is connected to.
Transmission only occurs when one side is interested and the other side is not choking. This enables peers to have a tit-for-tat where the peers which share the most freely are the ones which are able to access pieces the most rapidly. This localized enforcement of good behavior enables the network to scale upwards without collapsing. The hashing of all pieces sent ensure that no peers can "poison" the network by sending bad file fragments. This pervasive integrity checking was one of the things that allowed BitTorrent to succeed where its early competitors failed.
DHT and Magnet Links
BitTorrent uses a DHT protocol to enable peer discovery without requiring communication with the centralized tracker. DHT "nodes" are not the same thing as torrent "peers," although a computer can be both. Nodes listen for DHT requests over UDP, while peers listen for the BitTorrent peer protocol over TCP. BitTorrent clients include a DHT node, which operates mostly as a querying "client" node.
A client makes a query for a torrent by using the hash of the metadata's info as an ID, and finding the node that it knows that is closest to the key. This node is then sent the request. If the node doesn't have the torrent, the node forwards the request to the node that is the closest to the ID that it knows. This process iteratively finds the node in the network that is the closest to the query's key. If the peer can't find a peer tracking the metadata's info hash, it will have to insert itself as the node responsible for the key into the DHT by introducing itself to the nodes closest to the key.
In order to prevent a bad actor from registering other peers for torrents, the DHT includes a paper trail. The query will return a token. If the query fails, this token must be presented by the peer trying to register itself as responsible for the key. Tokens are only valid for approximately ten minutes.
A node constructs a list of peers by searching the DHT for nodes closest to its own node ID. Most of the peers the node knows will have nearly keys. A few exceptions will exist because of other observed nodes through other network operations. In this way, nodes tend to construct a very "localized" view of the network that makes it faster to narrow in on a single key quickly.
Now this DHT is quite simple, it associates peers with a torrent. In order to get the querying node enough information to begin downloading the torrent, the network must serve the torrent's metadata file as well. This newer feature is referred to as a "magnet" link, and enables a peer to complete an entire download using only the node ID. Trackers using magnet links need only provide links with hashes, rather than indexing and serving a collection of torrent metadata files.
BitTorrent is a very inclusive network in its default state. The Distributed Hash Table, peer exchange, and local service discovery can all help a peer find other peers without having to rely on the tracker. While this is typically a good thing, it can be undesirable to remove the ability to control file distribution centrally.
One frequent use case for centralized control is to enforce a share ratio. When a community shares many files, it becomes important to enforce that peers upload an acceptable ratio of data to what they download.
This fine-grained access control is done by restricting peer information querying to those peers that the private tracker decides should be able to download the file. This is really security through obscurity.
Once an intruder peer has obtained the IP address and port of a peer, regardless of the source, the intruder can initiate a connection to that peer and trade pieces with the peer. Once in the swarm, the intruder is granted equal treatment as all other peers.Source: http://www.bittorrent.org/beps/bep_0027.html
What happens is that the torrent's metadata file includes a "private" flag that tells peers to only use the tracker to exchange peers. Furthermore, a client may only use the peers from a single private tracker at a given time for a given torrent. This prevents a peer from uploading a private metadata file to a public tracker and having the existing swarm serve the file.
We're seeing a lot of new innovation in the BitTorrent sphere recently.
BitTorrent's power is proportional to its convenience. The associations with piracy have led to BitTorrent software that requires that one jump through a lot of hoops in order to use the software sometimes. You can only download from BitTorrent what peers are seeding through BitTorrent. Convenience is largely the reason that many consumers have abandoned piracy torrents in exchange for services like Netflix and Spotify.
PopcornTime (https://popcorntime.sh/) is an example of the power of BitTorrent with good content indexing and a nice UI. While it is plagued by the fact that it is a piracy-focused tool, PopcornTime represents a fairly flexible CDN system for popular media. One can imagine clones for open-access media and for news. What's worth noting is that the cost of running PopcornTime is absolutely minuscule for the software creators, especially compared to the behemoth infrastructure of Netflix. With good copy protection on media, BitTorrent may one day be seen as the powerful content-agnostic technology that it is.
The web2web (https://github.com/elendirx/web2web) project uses bitcoin and BitTorrent together to replace the entire web stack. The blockchain is searched for the last outgoing transaction from a given address. This transaction uses the OP_RETURN bitcoin opcode to define the transaction as invalid, and stashes the torrent infohash in the transaction's body. This hash is enough to fetch the website being served through WebTorrent.
CacheP2P (http://www.cachep2p.com/) uses WebTorrent to offer a website cache. The reasoning is that popular content is likely to have a nearby peer which is closer than the nearest CDN server. CacheP2P, it's worth noting, costs the website maintainer nothing and costs each of the site's visitors a very small amount of traffic. This stands in contrast to the expensive caching infrastructure necessary to blanket a country or the globe. If a site becomes more popular in a certain area, CacheP2P will bring more cache servers to the region automatically. This latter point is a good enough argument alone.