grack.com

UPDATE: I’ve written a PubSubHubbub-to-XMPP gateway that solves some of the issues of running a real-time feed reader behind a firewall.

UPDATE 2 rssCloud has a serious vulnerability that needs to be addressed in the protocol. I’ve linked some security recommendations here that rssCloud hubs should implement as soon as possible.

These last few months have brought us not one, but two RSS-to-real-time protocols: PubSubHubbub and rssCloud. While rssCloud has been “around” for a while, it never saw much adoption or interest until recently.

As a developer, the important question is: which of these two protocols should I focus on?

When you compare the two protocols technically, you find that there are some similarities (UPDATE: see here for a more in-depth comparison of the APIs):

  • Both PubSubHubbub and rssCloud allow the hub to live on a different server than the server that is providing RSS. This lets the complexity of both of these protocols to live in a black box somewhere else, managed by someone who cares more about getting the details right.
  • Both offer a fairly simple publisher “ping” notification for publishers. An rssCloud client can make a simple POST request to the specified cloud server, which is then verified by the server to ensure that the update was real (alternatively, rssCloud can use XML-RPC or SOAP, neither of which are in fashion right now). PubSubHubbub has a very similar POST operation with very similar semantics.
  • Both offer simple APIs on the hub for subscribing to feeds. PubSubHubbub offers an unsubscribe option, while rssCloud times out subscriptions after 25 hours (the client is expected to re-subscribe after 24).

There are some significant differences between the two protocols, however:

  • PubSubHubbub supports RSS and Atom out of the box. rssCloud does not support Atom right now, as noone has defined how it would look inside of an Atom feed.
  • PubSubHubbub provides “fat pings” to clients, while rssCloud only provides basic notification updates. A PubSubHubbub subscriber can keep tabs on a feed entirely through the ping notifications, allowing it to skip polling of any feed that supports the update protocol. rssCloud requires the subscriber to re-poll the feed after receiving a ping. The “fat ping” has the advantage of saving the feed publisher bandwidth, since clients aren’t downloading the same repeated feed entries time after time, and potentially CPU cycles, since the feed publisher only has to generate a single feed output for the hub rather than for all of its clients (this can be mitigated by caching the generated feed). The fat ping requires more work on the part of the hub, however, as it needs to detect which parts of the feed have changed and push those parts into the subscriber notification dispatch queue.
  • PubSubHubbub lets you subscribe any endpoint you like (with some intelligence to prevent you spamming pings to arbitrary hosts). rssCloud infers your endpoint hostname from the IP address of the request, requiring your subscription logic to live on the same servers as your ping endpoints.

Back to the question: which of these protocols should I focus on? The answer probably depends on what you are doing.

  • If you are a publisher that publishers both RSS and Atom feeds, it’s trivial for you to support pinging rssCloud and PubSubHubbub hubs. There’s nothing stopping you from doing it now - just figure out which hubs to use. If you use FeedBurner and PingShot, Google has already cloud-enabled your blog for you.  If you want to control your own hub, you’ll probably want to pick an off-the-shelf one. PubSubHubbub is likely the best choice here as it both saves you bandwidth and gets you real-time support in FriendFeed.
  • If you are planning on writing a hub, you’ll probably want to start with rssCloud. Its implementation will be simpler than PubSubHubbub as all it does is redistribute ping notifications.
  • If you are a feed reader or a content spider, you’ll probably have to implement both. I believe that PubSubHubbub gives you the biggest bang for the buck now, as it’s supported by nearly all of the Google feed properties: FeedBurner (the Atom/RSS intermediary choice for a significant number of self-hosted blogs), Blogger (millions of blogs) and Google Reader feeds. It’s also supported by LiveJournal (which lists 20+ million blogs on its homepage).  rssCloud is fairly new, but it managed to score a big integration with wordpress.com (7.5 million blogs, according to their own blog). Unfortunately, as not all of the big sites have implemented both, you’ll have to deal with two technologies for the time being.

After researching both of the technologies in-depth, I’d say that PubSubHubbub is the better technology overall.  While more complex to implement for hubs, it offers far more to feed readers and publishers in terms of bandwidth savings and real-time updates.  For companies doing content analysis, PubSubHubbub is a huge win: it brings the power of the Twitter firehose to RSS. No matter which technology you choose, however, you’ll be getting your RSS feed updates far more often.  It might even allow the next real-time technology to be built on an open XML feed rather than a proprietary company’s servers.

Read full post