Tatsuhiko Miyagawa's Blog

JSON, jQuery and some Unicode characters (U+2028)

February 16, 2009

You have a web application that generates JSON dynamically from the data in your database, which probably comes from the cloud. Your data might have some weird strings, like U+2028 the Unicode LINE SEPARATOR character. When you use Perl’s JSON::XS, or probably other JSON encoder modules as well, it gives you an UTF-8 JSON data like this:

  {“text”:”Foo[\xe2\x80\xa8]Bar”}

Note that this [\xe2\x80\xa8] stands for 3-byte strings that represent UTF-8 encoding of one character U+2028. I think this is a valid JSON UTF-8 representation, but since jQuery uses browser’s built-in eval() function to parse JSON, this JSON data can’t be parsed with Safari and Firefox. You can confirm that by pasting this to your Location bar. 

  javascript:try{eval(‘({“text”:”Foo\u2028Bar”})’)} catch(e){alert(e)}

This will give you a Syntax Error.

There’s a couple of workarounds to this. The easiest is to escape high-bit characters as \uXXXX notation in the server side. In JSON::XS it’s as simple as JSON::XS->new->ascii->encode($data); intead of ->utf8. This is a little weird to use this option since this changes ALL non-ascii characters in \uXXXX not only these “special” characters like U+2028, but it’s probably the most straightforward anyway.

The other fix would be not using jQuery’s native JSON parser which is just eval(). Instead you get the text data using dataType: ‘text’ and use token-based JSON decoder like Crockford’s. There might be JSON plugins to do this as well. jquery-json has evalJSON() method but hm, this seems to use eval() as well. And that approach would slow the parsing significantly anyway.

See also Comparing JSON modules and whitespaces