grack.com

For a while I’ve been wondering if a hybrid Bittorrent/Reed-Solomon system would be a more effective swarming download system. 

Just to make this clear ahead of time, I don’t believe the benefits of the Bittorrent/Reed-Solomon system would kick in until the torrent had been running for a longer amount of time.  The reason for this is that Bittorrent has a built-in “rarest first” algorithm (described in this Bittorrent paper).  The obvious goal the “rarest first” algorithm to replicate the rarest pieces of the system first, while leaving the most common pieces - the ones most likely to be around longer - until the end. 

In a Bittorrent network where a seed node exists for the entire period of the torrent’s existence, this parity system would provide no benefit.  This system really shines, however, by increasing the probability that a client node can retrieve enough data from a seedless network to become a seed itself.

So, on to the idea.  I’m proposing that instead of seeding pieces of the original file at the start, the seed distributes a large number of Reed-Solomon-encoded “parity files” (perhaps twice the number of blocks as the original number of Bittorrent blocks - maybe more).  These blocks would be “virtual” - computed in real-time as they are requested by the clients.  Reed-Solomon encoding is a fast process.  I’ve seen throughput of ~360MB/s on my Athlon 2100+.  This is far more than enough for real-time encoding and distribution (the blocks could always be pre-calculated and cached for a less powerful system). 

These parity blocks would be distributed using the original Bittorrent algorithms (including rarest-first).  For simplicity, I’ll use the situation of providing 100% redundancy - twice the number of blocks as necessary to reconstruct the file.  Once a Bittorrent client had received enough blocks to restore the original file (50% of the blocks), it runs the Reed-Solomon decoding process to create the original file.  From the original file, it can then continues as a seed - and can calculate the extra parity blocks as necessary.

The goal of the system is to ensure that each Bittorrent client has as diverse a set of blocks as possible.  Imagine a situation with an infinite number of Reed-Solomon-encoded blocks, where n blocks are required to reconstruct the target file.  Each downloading client can get the next block in the infinite sequence of blocks from the server.  Once a client has received n of these blocks, it can then generate any of the other possible encoded blocks. 

If this hypothetical ad-hoc network suddenly lost each of its seed nodes, the remaining clients would only need to share n different blocks between them to reconstruct the file.  I believe that as the number of possible blocks goes up, the probability of this occurring is greater than if only n possible blocks existed.

Perhaps an example network might show the advantages:

In this system, there are 200 parts to the file.  In the original Bittorrent system, this equates to 200 blocks.  In the Reed-Solomon-encoded system with 100% redundancy, this equates to 400 blocks.  There are 30 clients trying to download this file.  When the seed node gives out, each client has 10 blocks of the file (5% in the Bittorrent case, 2.5% in the Reed-Solomon case).

First, the Bittorrent case:

So, the probability that a client has any block is 5%.  This means that there is an 95% chance that a client does not have a particular block.  The probability that none of the 30 clients has this particular block is then 95%^30 = 21.46%.  The probability that one of the clients has this block is (1-21.46%)=78.54%.  The probability that each block is possessed by at least one client is then 78.54%^200 = 1x10^-19%.

Now, in the Reed-Solomon case:

The probability that a client has any block is 2.5%.  This means that there is an 97.5% chance that a client does not have a particular block.  The probability that none of the 30 clients has this particular block is then 97.5%^30 = 46.79%.  The probability that one of the clients has this block is (1-46.79%)=53.21%.  The probability that enough blocks are contained in the system is calculated using the Binomial function with:

  • Probability of success, p = 53.21%
  • Number of trials, n = 400

We take the probability that the number of successes will be greater than or equal to 200 in this case.  The probability that not enough blocks (ie: 0-199 blocks were available, 200 were required) exist in the system is now:

k=0
Σ B(k,n=400,p=53.21%) = 9.08%. 
199

There is now a 90.92% (the inverse of the above probability) chance that the system can recover enough to provide another valid seed in the future.  This is significatly better than the 1x10^-19% percent chance from before.

Thoughts?

Read full post

I posted the analysis from earlier to the Bittorrent development mailing list. I received a reply from Bram Cohen illustrating one flaw in the argument:

You make the assumption that the probabilities of different clients having a given block are decorrelated, which is wrong. That’s the point of rarest first, it makes it so that if one peer doesn’t have something it’s more likely that another peer will.

Yeah - good point. I had neglected to factor that in - and it makes all the difference. In fact, with rarest first, it’s quite likely that both of those situations described have a 100% recovery probability (reading the source, it appears that rarest first kicks in after you’ve downloaded 4 blocks, and will ensure that each block has at least 3 peers hosting it).

I’ll put forward a slightly different case - it may be such a rare case that it might not be worth worrying about. It’s probably good to at least understand where the (potential) weaknesses might be, even if the probability is fairly low of being exposed to them.

We can take the situation from before and add another network failure - one of the remaining nodes drops off. I’ll use the case where the rarest first algorithm has ensure that each node has a fully disjoint set of pieces from the initial download (though there will likely be *some* overlap due to the random first downloads).

Each of the 30 nodes has 5% of the 200 pieces = 10 blocks. This yields 300 pieces total in the system: 50% of the blocks have a single peer, while the other 50% have two peers hosting it. Rarest first has kicked in here, but hasn’t completely propagated the blocks out yet.

The probability that any given block is dropped out of the system by this event is the weighted sum of these two probabilities:

  1. Only one of these blocks in the system: 50% * probability that the block was hosted on the node that went down. There are 30 peers, so the chance is 1/30.
  2. More than one of these blocks in the system: 0% (there will always be the other block to download)

The probability is then 1/60 that the block went down, or 59/60 that it survived. The probability that each of these 50 blocks (the blocks without a redundant copy somewhere) survived is (59/60)^50 = 43.16%. This is a 56.84% chance that the extra node dropping off took an essential block with it, leaving the network with a maximum completion percentage of 199/200 = 99.5%.

In the Reed-Solomon case (400 blocks total, 200 required), there would still be 290 distinct blocks left, more than the 200 required to recover the system.

As I mentioned before, both of these cases assume that the downloaded blocks are fully disjoint. The math would certainly be more difficult to determine without this assumption, however.

So - I believe that this would a case where redundant encoding would be better, but how likely is it for the network to encounter this situation? Is it worth programming for this case? Is this case representative of a lot of torrents?

The question remains open. 

Read full post

I am so tired of hearing about Longhorn.  It’s a shame that the blogs that are updating most frequently in the last few weeks are the ones from the .NET-praising crowd.  Information addiction can be a painful thing sometimes.  My inspiration for this rant comes from this article.

Anyways, I hate to burst everyone’s bubble, but it seems like all of the arguments I’m hearing were made a few years back.  What was that technology again?  Oh yeah, XML. 

Let me quote some of ozzie.net’s points here:

No - the “big deal” about WinFS IMHO isn’t “search”. Like Web Services - it’s about the fact that it’s an attempt to get a higher level of interoperability between programs through agreement on schemas.

Hrm.  Sounds a little familiar, eh?  Yep - that was what we were saying about XML.  Now, I’m not putting XML down.  It’s a great tool and I love it (and use it often).  But how much agreement have we got on schemas?  Look at the example mentioned in this article: RSS.  I can’t even get my feed to validate with a .dtd reference in it.  Each RSS feed-reader needs its own set of hacks to ensure that it can read every RSS page out there.  XML is even an  open standard, too!  I’ve been in the Oil & Gas industry through the entire lifetime of the XML revolution and I have yet to see a strong common schema emerge.  I can’t speak for other industries, but does the phrase “yeah it’s nice, but we need something just a liiiiiiiitle bit different for our data” sound familiar?

Agreement on common formats and protocols can yield powerful network effects and unintended positive consequences: this is already quite apparent in how people are beginning to leverage Amazon’s Web Services

Now if there’s anything Web Services has taught us, it’s that each vendor will describe their own Web Services API.  The real benefit of Web Services is that it’s easy to “consume” them.  How many companies use Amazon’s Web Service API?  Hmm….  Oh yeah, one.  How many companies use the same Google’s Web Service API?  One!  See a pattern here?  The only thing common here is that they use Web Services and SOAP.  It’s a If-We-Want-To-Communicate-I’ll-Use-Your-Schema sort of thing and When-I-Want-To-Communicate-With-The-Other-Guy-I’ll-Use-His too.

I hate to let everyone in on a secret here, but the only way two systems can communicate is if they agree on a common format and protocol.  Nothing special here.

I’m sure Microsoft can define a couple of schemas for the “easy things”: address book contacts, images, music and video files, but what good will this deliver for us?  Perhaps we might benefit slightly from uniformity with a single standard instead of vCards, EXIF and ID3 tags, but I can’t see how calling the WinFS APIs is any different than calling the vCard/EXIF/ID3 APIs directly. 

Another question: why isn’t Microsoft driving these common schemas though a common standards body?  Yet Another Windows-Specific Format?

Longhorn still doesn’t solve the problem that many of the relationships between our data objects are human-based.  Some relationships are specific to a single individual.  Human names are so context-specific that it’s virtually impossible to describe links using this attribute.  But what else do we have in some cases?  How does the record executive get his contact for “Marvin Gaye” to link to a list of albums and lyrics, while ensuring that my friend Marvin Gaye’s address isn’t associated for any of the same things.

Or the mobile client for Siebel, SAP, Great Plains, or salesforce.com, with all client-side data now open and available to VARs and other solution developers?

Yeah right.  Just like XML has allowed me access to the proprietary data of each application I’m using.  The existance of an open data format doesn’t mean that your favorite (or mission-critical) application stores its data in the open!  I’m sure that all that raw, unfiltered, proprietary data within Microsoft Encarta is itching to get out.

This brings me to my final point - what use is the data on my WinFS drive if it can’t interoperate with my Linux box.  Yep, that’s my Linux box.  Next to my Windows 2000 box.  The Linux box that I do all of my word processing/spreadsheeting/surfing on.  What I see in WinFS is the Windows-centric design philosophy: everyone is using Windows, so we can just assume that all of their data will be somewhere in the Windows domain!

I’ll conclude with this juicy tidbit from ozzie.net:

And in each case, the designer ended up saying “now that we’ve invested in this framework, let’s offer it as a platform so that other developers can plug in their own modules and custom schemas, and a powerful ecosystem will naturally emerge!” But it hasn’t happened. No single framework has achieved PC-like or browser-like ubiquity; as powerful as they are, these environments have and will likely remain islands of function and islands of users.

Unfortunately, I don’t see how WinFS and Longhorn are any different than any of these other frameworks.

Read full post

Robert Scoble has written a short response to my previous article.  His first question:

Serving files is one thing. But how much of the “WinFS experience” (which is quite cool) should be shared with older or non-Longhorn OS’s?

Good question, though not really one anyone but Microsoft can answer.  Without interoperability, WinFS isn’t much use to me, personally.

His second question:

Here’s the important question: why? What is the “win-win” for investing the time and work to give these to a standards body? How will Microsoft recoup its investment? Remember, at the end of the day, Microsoft is a business and needs to see a return on investment.

Well, the web was built on standards.  Heck, it was built before Microsoft decided it wanted in.  It’s a standardized, shared protocol (well, if you don’t count all the vender-specific bugs and additions) that we all use to communicate.  It’s gained a fair amount of traction and popularity.  Some of the comments on Robert’s story here indicate that this is a view shared by many:

The difference between tcp/ip and the dozens of competing network protocols is standards, and the commitment to make things work together. IBM had a bigger internal network than the Internet for a *long* time – where is it now? Incompatible and dead. Novell made a fortune off of it’s own network protocol. Where is it now? Incompatible and dead.

Oh yeah.  There’s also the matter of Microsoft’s massive war chest.  Why not use some of that war chest to manoeuvre the company into a healthy competitive position, rather than the monopolistic giant it is now?  This is an illustration of Microsoft’s profit from their latest SEC filing, care of The Inquirer and groklaw:

The Server and Tools unit had costs of $1496 million and generated $370 million, a 24.7% profit.

The Information Worker units, responsible for application software, had costs of just $696 million and that generated $1591 million, a profit of 228.6% of the money they spent.

The grand-daddy of them all was the unit responsible for Windows. It had costs of just $545 million but generated a profit of $2264 million, a staggering 415.4% profit on the money they put into it.

Staggering indeed.  Makes you wonder why they worry about “what’s in it for Microsoft?”

Read full post