Tatsuhiko Miyagawa's Blog

_why

August 20, 2009

So _why disappeared from the internet. I’m a perl guy and have never met him in person in conferences or anywhere, but have been always inspired by his crazy wonderful code, like camping.

I actually remembered that I once exchanged an email with him 3 years ago, and he was quite nice. In January 2006 I was in Taipei and hacked with Audrey Tang (at that time Autrijus) on porting _why’s syck YAML parser and encoder to perl, and then implement JSON::Syck as a subset. It was long before JSON RFC was published, and it wasn’t clear how to encode high-bit characters in JSON: UTF-8 bytes or Unicode encoded form (\uXXXX).  Well, see it’s still confusing some libraries :)

I sent an email to Audrey, Douglas Crockford (the author of JSON) and _why:

Hi,

JSON spec says "A string is a collection of zero or more Unicode
characters, wrapped in double quotes, using backslash escapes. A
character is represented as a single character string." but i can't
get what it exactly means.

Suppose, when you pass utf-8 bytes from Perl to JSON, what are we
supposed to do to encode it into JSON? Should it be "{utf-8 bytes}",
or "\uXXXX"?

Douglas replied to me respectfully:

These are good questions. JSON is concerned with characters, not bytes.
The \uXXXX should be used to encode UTF-16 codes, not byte codes.

JavaScript never sees the original encoding. It only sees the UCS-2
material that the browser produces from it. So the thing that Syck 
is doing is wrong.

Also see http://www.crockford.com/JSON/draft-jsonorg-json-00.txt

Note that when Douglas said “the thing that Syck is doing is wrong” it’s not _why’s syck but our perl Syck binding we worked on at that time. We then exchanged another set of replies for more clarification but then we figured out what’s currently specified in JSON RFC today.

Later _why replied to me privately:

Mmmmnn. Good questions, all. Good answers as well. I value you tremendously
already.

That’s very kind, and even though I’ve never met him in person I could easily imagine what kind of personality he has. I’ll be missing his appearance on the web and his code (though it’s already mirrored by lots of people), but thank you _why.