On Perl and backward compatibilities

September 28, 2011

Much of my thinking about the future of Perl 5 stems from the following principles:

New versions of Perl 5 should not break your existing software
Backward compatibility must not stop Perl 5 from evolving

The message linked here discusses lots of insights on perl 5.16 and beyond, based on the talk Jesse Vincent has been busy giving at various conferences this year. It’s a great read if you’re interested in the future of perl the language.

When Perl and Ruby get compared, it is often mentioned that Perl takes a lot of care and efforts to be as backward compatible as possible while Ruby (the language and its ecosystem such as rubygems, rails etc.) do not care that much and prioritize on evolving faster by making drastic change more frequently.

I think this holds true in some sense - for example, scripts written for perl 5.8 (released more than 8 years ago) will most likely just work on perl 5.14, without any changes. That’s probably not the case with Ruby 1.6 and 1.9.

This makes people feel much less worried about upgrading perl, which is a great thing, but I could imagine that the restriction (“upgrading perl should not break existing code, even a major upgrade”) definitely makes the development and evolution in perl harder than otherwise.

I can say that the similar thing is happening outside the language development as well: in CPAN modules.

CPAN module’s policies

Most (but not all, I know) of the CPAN modules, once uploaded to CPAN, try to keep the API interface stable and as backward compatible as possible. That’s the de-facto virtue of CPAN modules. If a module keeps changing API with every release, it will be rated horribly bad, and people would eventually stop using the module/framework, thinking it’s unstable/fragile.

While generally it is a good thing to keep the interface stable, and I appreciate all the heroic efforts of perl module developers putting so much efforts to keep the backward compatibilities, I could say it is unfortunate if that prevents their software from evolving as fast.

And I personally guess that this is simply because there’s no good/easy way to manage this situation.

Example

Let’s take some example. Although this is not a real example (notice the version numbers that are not real :D), this is pretty much what’s happening today.

In 201X, Catalyst 8 introduced some feature X, and developer Joe wrote an extension called CatalystX::Foo based on the new feature. It declares the dependency on Catalyst by saying:

requires ‘Catalyst’, 8.0;

in Makefile.PL, which will later be written down to MYMETA.yml. He shipped the extension to CPAN. When an end user requires CatalystX::Foo, it will pull down the latest version of Catalyst and it works fine. So far, so good.

Few months later, Catalyst dev team decided that the feature X was a good idea but needs some modifications, and had to make some API incompatible changes that breaks CatalystX::Foo, and shipped Catalyst version 9.0.

So now, CatalystX::Foo stops working** when an end user upgrades Catalyst to version 9**. As long as the user keeps version 8 it’s fine, but it’s like a time bomb now.

How do we prevent this kind of thing from happening? Today, here’s what they do.

Catalyst developers keep the list of affected downstream modules and notify authors to upgrade to the newer API.
Downstream (e.g. CatalystX::Foo) author uploads the new version to CPAN, declaring the dependency on Catalyst version 9.
Catalyst gets shipped with a giant list of conflicts table (search for %conflicts) which warns users that they have to upgrade these downstream modules after installations. They can’t just depend on the new version.

While I respect the effort by the upstream developers (i.e. Catalyst developers) for maintaining the list, to me it looks like a non-ideal situation here, because:

it is odd that upstream modules have to take care of downstream, not the other way round
this doesn’t scale, and doesn’t work for modules that are not on CPAN (i.e. DarkPAN, github)
the warnings generated by the upstream (Catalyst) is useful, but there are CPAN installer that ignores Makefile.PL output (like cpanminus :)) and the output is not given attention in the case of automatic scripted installs anyway

Personally, I don’t want to maintain the list of Plack::Middleware modules once (say) I decide to break the API compatiblity in Plack 2.0.

Solutions

I don’t have a magic bullet that solves this situation right away, but I have some ideas, and the actual code to implement that. Here’s the gist of it:

Downstream dependencies are advised to actually use version ranges in dependency declarations
Upstream should use semantic versioning or something similar, so that downstream can somehow predict API incompatibilities using version numbers.
CPAN installers should install MYMETA.yml files into site_perl library path so that it can be used for later instropections.
Every project should use a separate local::lib to have an isolated library path.
Write a tool to rescan the whole (per project) library path to ensure there’s no conflicts, and urges end users to upgrade/downgrade if there are.

Overall, this is what bundler does for Ruby gems, and I’m trying to make carton do the same thing for CPAN and local::lib.

It will make the end user happy by allowing them to lock the dependencies, and will make the module authors happy by allowing them to evolve fast, without worrying *too much *about backward incompatibilities on downstream (although they really care about that to some extent anyways).

I plan to incorporate all of this into Carton 1.0 release, which should happen before or during YAPC::Asia 2011 in Tokyo, which is in 2 weeks.

In other words, this is how conference driven development works :)

Tatsuhiko Miyagawa's Blog

CPAN module’s policies

Solutions