grack.com

Blog

Bittorrent + Reed Solomon Code

For a while I’ve been wondering if a hybrid Bittorrent/Reed-Solomon system would be a more effective swarming download system. 

Just to make this clear ahead of time, I don’t believe the benefits of the Bittorrent/Reed-Solomon system would kick in until the torrent had been running for a longer amount of time.  The reason for this is that Bittorrent has a built-in “rarest first” algorithm (described in this Bittorrent paper).  The obvious goal the “rarest first” algorithm to replicate the rarest pieces of the system first, while leaving the most common pieces - the ones most likely to be around longer - until the end. 

In a Bittorrent network where a seed node exists for the entire period of the torrent’s existence, this parity system would provide no benefit.  This system really shines, however, by increasing the probability that a client node can retrieve enough data from a seedless network to become a seed itself.

So, on to the idea.  I’m proposing that instead of seeding pieces of the original file at the start, the seed distributes a large number of Reed-Solomon-encoded “parity files” (perhaps twice the number of blocks as the original number of Bittorrent blocks - maybe more).  These blocks would be “virtual” - computed in real-time as they are requested by the clients.  Reed-Solomon encoding is a fast process.  I’ve seen throughput of ~360MB/s on my Athlon 2100+.  This is far more than enough for real-time encoding and distribution (the blocks could always be pre-calculated and cached for a less powerful system). 

These parity blocks would be distributed using the original Bittorrent algorithms (including rarest-first).  For simplicity, I’ll use the situation of providing 100% redundancy - twice the number of blocks as necessary to reconstruct the file.  Once a Bittorrent client had received enough blocks to restore the original file (50% of the blocks), it runs the Reed-Solomon decoding process to create the original file.  From the original file, it can then continues as a seed - and can calculate the extra parity blocks as necessary.

The goal of the system is to ensure that each Bittorrent client has as diverse a set of blocks as possible.  Imagine a situation with an infinite number of Reed-Solomon-encoded blocks, where n blocks are required to reconstruct the target file.  Each downloading client can get the next block in the infinite sequence of blocks from the server.  Once a client has received n of these blocks, it can then generate any of the other possible encoded blocks. 

If this hypothetical ad-hoc network suddenly lost each of its seed nodes, the remaining clients would only need to share n different blocks between them to reconstruct the file.  I believe that as the number of possible blocks goes up, the probability of this occurring is greater than if only n possible blocks existed.

Perhaps an example network might show the advantages:

In this system, there are 200 parts to the file.  In the original Bittorrent system, this equates to 200 blocks.  In the Reed-Solomon-encoded system with 100% redundancy, this equates to 400 blocks.  There are 30 clients trying to download this file.  When the seed node gives out, each client has 10 blocks of the file (5% in the Bittorrent case, 2.5% in the Reed-Solomon case).

First, the Bittorrent case:

So, the probability that a client has any block is 5%.  This means that there is an 95% chance that a client does not have a particular block.  The probability that none of the 30 clients has this particular block is then 95%^30 = 21.46%.  The probability that one of the clients has this block is (1-21.46%)=78.54%.  The probability that each block is possessed by at least one client is then 78.54%^200 = 1x10^-19%.

Now, in the Reed-Solomon case:

The probability that a client has any block is 2.5%.  This means that there is an 97.5% chance that a client does not have a particular block.  The probability that none of the 30 clients has this particular block is then 97.5%^30 = 46.79%.  The probability that one of the clients has this block is (1-46.79%)=53.21%.  The probability that enough blocks are contained in the system is calculated using the Binomial function with:

  • Probability of success, p = 53.21%
  • Number of trials, n = 400

We take the probability that the number of successes will be greater than or equal to 200 in this case.  The probability that not enough blocks (ie: 0-199 blocks were available, 200 were required) exist in the system is now:

k=0
Σ B(k,n=400,p=53.21%) = 9.08%. 
199

There is now a 90.92% (the inverse of the above probability) chance that the system can recover enough to provide another valid seed in the future.  This is significatly better than the 1x10^-19% percent chance from before.

Thoughts?

x0rfbserver

UPDATE: x0rfbserver has disappeared from the web.  No worries, however, as x11vnc is an excellent replacement.

I’ve just discovered the joy of x0rfbserver.  I always had trouble understanding why the unix VNC’s wouldn’t allow exporting of a pre-existing X session.  With x0rfbserver, you can export your existing X session to another computer.  This is how WinVNC works under Win32 - it’s handy in certain cases when you don’t own a KVM switch.

You’ll need to install the xclass and rfb rpms.  I couldn’t find any for RedHat, but the Connectiva i386 ones worked like a charm.

Note: After using it for a few hours, it seems like it can be a bit crashy at times.  I have an SSH window open that I use to restart it on the other machine.  I’m still hooked!

RSS 0.91 feed

I’ve added an RSS 0.91 feed.  Get it at http://feeds.grack.com/grack, or use the RSS links at the bottom of this page.

Bochs GUI

This is the prototype Bochs GUI that I’m working on. Many Windows users judge the quality of a program by the “Windows-ness” of its interface. Bochs is a powerful tool, but as the current Bochs interface looks archaic (despite it’s full-featuredness), it may potentially turn off a number of users. My goal is to create a UI that feels rich and native to a Win32 user, without needing major modifications to Bochs.

Screenshots:

(UPDATE) Newer screenshots:

Here is a summary of my project goals:

  • Hosting multiple (simultaneous) simulators
  • Allow simulator window to auto-fit virtual machine’s current resolution or to operate in full-screen, matching resultion mode
  • Easier (wizard?) configuration of devices by “adding hardware”, rather than configuring all possible devices
  • Single-click launch of a virtual machine (.bochs file?) from Explorer
  • Explicit input capture mode with keyboard-shortcut capture escape (Ctrl+Alt)
  • Support for sending “Windows” key, Alt+Tab and Ctrl+Alt+Del (via Ctrl+Alt+Ins) to virtual machine

The changes required to support this are in win32.cpp only. I’ve added an environment variable to specify the parent window of the simulation window, as well as a number of custom messages that the simulation window sends its parent. If the environment variable is set, bochs assumes that it is being hosted within a different control and it hides the main win32 GUI window.

The patches are far too ugly to send right now (though they are small). I’m hoping that someone can help me integrate this cleanly into Bochs without disrupting the use of the standard Bochs Win32 interface.

I decided not to enhance the wxWindows port of Bochs because I thought I could make use of a number of .NET toolkits to create a rich UI. Since I’ve been using C# exclusively lately, this was a natural fit for me. It does limit the users of the system to those with the .NET framework installed, but as this was included in one of the recent batches of critical updates, it should be installed on a number of systems.

Identd added

I added a page for a description and download for Identd (binaries and source).