Tatsuhiko Miyagawa's Blog

CPAN Realtime Updates

August 31, 2009

This bot periodically sends smart HTTP GET against cpan.cpantesters.org fast CPAN mirror to get semi-realtime updates. The delay is usually less than a minute (about 30 seconds) and is much faster than other bots based on search.cpan.org.

via friendfeed.com

CPAN is the Perl’s most favorite “language feature” all the time. It’s a great ecosystem maintained, mirrored and smoke tested by lots of people.

There’s one problem with this mirror system though: it usually takes a day or two to get the latest module updates to your mirror. The search.cpan.org site is updated pretty frequently, but it still takes a few hours to get your module to be indexed there. That’s a long time isn’t it?

It’s not only me that’s frustrated by that and Andreas the creator of CPAN, and Dave Golden, the current maintainer of Module::Build, along with folks at Perl QA hackathon, came up with a hack to speed up the rsync mirror process for a few mirrors. It’s a hack to rebuid recent updated file index whenever a new file is created, and instead of doing the full rsync delta which is CPU expensive, just poll the updated index file to sync only the updated files.

cpantesters.org is one of the fast mirror that adopts this sync, and it’s currently set to sync every 20 seconds. If many of you want to get this almost realtime updates and sends frequent polls to this master PAUSE or cpantesters, that doesn’t really scale.

So I made this: CPAN realtime bot on FriendFeed. It’s a bot periodically fetching those fresh index from cpantesters.org to get the latest updates and sends the update to FriendFeed group. The latency is usually 30 seconds and it’s pretty much realtime.

The reason I chose FriendFeed over Twitter is that on FriendFeed you have various options to receive updates: by default you can get the updates via Email or XMPP, but also can use FriendFeed’s Realtime API (comet long-poll).

I also added a hook on this bot to send PubSubHubbub pings to Google’s hub as well as SuperFeedr so you can receive the updates via Webhook as well. This postbin is an example callback sent from SuperFeedr using PubSubHubbub via pubsubhubbub callback proxy.

So, use the existent CPAN ecosystem as a data source, and use the cloud (FriendFeed, Google and SuperFeedr) as a scaling infrastructure to deliver the realtime updates. Awesome isn’t it?