Tatsuhiko Miyagawa's Blog

PSGI: start_response or not?

September 23, 2009

For the impatient: can you make echo.psgi streaming server (run it as plackup -i AnyEvent -a eg/dot-psgi/echo.psgi port 9090) work without $start_response?

Python’s WSGI and Ruby’s Rack (as well as JavaScript JSGI) has a significant interface difference: start_response or 3 param responses. Namely:

Python’s app will get two parameters env and start_response (in Perlish code):

sub app {
 my($env, $start_response) = @_;
 # do something
 my $w = $start_response->($status, $headers);
 return $body; # or $w->write($body)
}

This is a little ugly, but useful for server side push without thread environment (i think). Ruby has threads (whether it’s a native thread or user level thread doesn’t really matter here) and it only has one arg for the app handler and returns 3 parameters as a response (in Perlish code):

sub app {
 my $env = shift;
 # do something
 return [ $status, $headers, $body ];
}

We liked Rack’s simplicity and made that as a default interface.But well, this is Perl, and we can’t say threads are available anywhere. Of course we do have ithreads, and wonderful Coro to do continuation which is actually a much better thing than ithreads (IMHO!), but asynchronous environments like AnyEvent, Perlbal, POE or other event loops would suffer if we require this three param response if they pause the request and resume once it’s ready, or do server side push. 

So we (half-)decided that ‘psgi.async’ $env variable should be an option to pass $start_response if needed. This ‘optional’ thing would make the whole thing a lot complicated. (I haven’t written this async spec down for this exact reason)

Is being optional good or bad?

Shall we make start_response optional, or make this a default and ditch three param response? 

Being optional means more flexibility: server implementations doesn’t need to pass $start_response if that doesn’t make sense there. The application framework may ignore $start_response if they don’t want to do streaming. This might sound all good.

I observed an interesting thing when writing and testing Catlyst::Engine::PSGI, the PSGI adapter for Catalyst. Most catalyst engine has write() method, that outputs HTTP header (if it’s not sent yet) and print the content immediately to the client. This is useful when you want to output a huge amount of data file (like CSV) out of the database without eating much memory. Of course you can rewrite that to save to a temp file first, or to write a PerlIO or IO-like objects that does database lookup until it’s done using the getline() interface. 

Actually, today most Catlayst engine implementation supports the immediate print. 

The problem is that the current Engine::PSGI doesn’t really support $start_response (yet) in $c->res->write . The output buffer to the write() method is currently buffered and then returned once everything gets done as three param response [ $status, $headers, $buffered_body ] to the PSGI server backends.

This might be okay, but not ideal. To support the immediate output, I should update Catalyst::Engine::PSGI to support $start_response, so if the response handler is there we’ll immediately start the response and then write the body to the writer object.

But then, most implementations other than AnyEvent and Perlbal do not support $start_response, so they should support it to do streaming write, if they can.

This makes the both side (app side and server side) implement both start_response mode and direct response mode.

Also, when we think about middleware that does both side, things get more complicated.

tokuhirom and I discussed how the Gzip compress middleware would look like in WSGI and then Rack, and how we’ll write them in Perl if they do $start_response or not. 

Overall, start_response interface is more flexible but ugly. three param response is cleaner and middleware is easier to adapt but has limitations in event loops. Allowing both makes the both side initially happy, but makes the both side unnecessarily complicated eventually, i think.

To be honest, using start_response everywhere might make the both side harder (and uglier) to write initially but eventually reasonable amount of code. In other words, if we can implement the event loop pause/resume/server push with three param style response (with special $body type?) then we can leave start_response out.

So, today I’m okay with making it an option, but I’m really afraid this will bite us sooner than later. And then we might better decide to make it out (no start_response at all) or in (start_response everywhere).

What do you think?