Re: Percent-encoding URIs in Perl - Mark Stosberg
utf8::encode $[0] if utf8::is_utf8 $[0];
utf8::encode if utf8::is_utf8
is a bug. Don’t do it.
There are reasons URI::Escape provides two functions, uri_escape and uri_escape_utf8. The former handles arbitrary byte strings, whether it’s utf-8 or not, and the latter handles given strings as (possibly wide) characters.
Doing utf8::encode based on the utf8 flag is so wrong. It just tells the internal representation of a scalar and some latin-1 range characters might be encoded in a bogus way unless you explicitly call utf8::ugprade on it before passing to the function.
Nothing’s wrong with URI::Escape providing the uri_escape that handles arbitrary encodings. While I agree most web pages should just use UTF-8 for everything in 2010, using other text encodings such as EUC-JP, or even arbitrary binary data (such as JPEG data) in URL is not invalid either.
Mark’s quote from RFC3986 is done without its context. It says “When a new URI scheme defines a component that represents textual data consisting of characters from [UCS] …” which doesn’t apply when we encode parameters for the web URLs. It’s not a new URI scheme, it shouldn’t necessarily represent “textual data” either.
Don’t rely on utf8 flags of the strings. See perlunitut and perlunifaq for more details.