I'm migrating the front-end of a site from an old YUI2 framework to jQuery/BackBone. The PHP/mySQL back-end hasn't changed. All is well, except UTF-8 characters sent via Backbone save (via $.ajax) are getting mangled and I can't figure out why.
Here's what I do know:
- The backend handles UTF-8 fine. It hasn't changed as part of this rebuild. I know that's true, because when I change the config to load the old YUI2 front-end, UTF-8 characters work fine. They're escaped in Javascript using
escape(string)
, passed viaYAHOO.util.Connect.asyncRequest
as JSON in anXMLHttpRequest
, unescaped and saved in the database as UTF-8, fully readable and nice. - In the new front-end, I've added
<meta charset="UTF-8">
and<meta http-equiv="content-type" content="text/html; charset=UTF-8">
to all page headers. The old front-end didn't have these settings. I only mention that because it's a difference. - In the new front-end, UTF-8 characters work fine when I save them as a
<form>
submit. - I the new front-end, the request Content-Type looks fine in the console.
Content-Type:application/x-www-form-urlencoded; charset=UTF-8
How am I passing data in the new front-end?
Sometimes via a regular Backbone model.save(), other times passing data in options like this:
var text = $('#input-' + targetId).val(); var atts = {}; atts['target_id'] = targetId; atts['user_id'] = userId; atts['text'] = text; var comment = new Comment(atts); comment.save( {}, { type: 'POST', url: '/api/comment?', data: atts, processData: true, success: function(comment, response){ //success handling }, error: function(model, response){ //error handling }, }, );
So, what do these mangled special characters look like?
As entered in the input: テクス テクサン テクス テクサン
When I pass completely unescaped, they look fine in the request in the console in the Form Data section:
text: テクス テクサン テクス テクサン
, but mangled in the database asãã¯ã¹ ãã¯ãµã³ ãã¯ã¹ ãã¯ãµã³
. Perhaps this is a clue, I don't know. I've always escaped user-entered text when passing via AJAX.When I
escape(text)
, I gettext:%u30C6%u30AF%u30B9%20%u30C6%u30AF%u30B5%u30F3%20%u30C6%u30AF%u30B9%20%u30C6%u30AF%u30B5%u30F3
in the console, andテクス%20テクサン%20テクス%20テクサン
in the database.
That's better, but it's different from the old front end, which uses escape(text)
, passes %u30C6%u30AF%u30B9%20%u30C6%u30AF%u30B5%u30F3%20%u30C6%u30AF%u30B9%20%u30C6%u30AF%u30B5%u30F3
, shows in the console as text: (unable to decode value)
and saves in the database unescaped as テクス テクサン テクス テクサン
Of course, it's 2016 now and we all know
escape()
should not be used. We should useencodeURIComponent()
instead. So, when IencodeURIComponent(text)
, here's what I get in the console:text: %E3%83%86%E3%82%AF%E3%82%B9%20%E3%83%86%E3%82%AF%E3%82%B5%E3%83%B3%20%E3%83%86%E3%82%AF%E3%82%B9%20%E3%83%86%E3%82%AF%E3%82%B5%E3%83%B3
which is saved in the database as%E3%83%86%E3%82%AF%E3%82%B9%20%E3%83%86%E3%82%AF%E3%82%B5%E3%83%B3%20%E3%83%86%E3%82%AF%E3%82%B9%20%E3%83%86%E3%82%AF%E3%82%B5%E3%83%B3
That technically works, and I can alwaysdecodeURIComponent
when displaying this text, but that's a real pain and it's just masking the issue.I've also tried
unescape(encodeURIComponent(text))
with the following result:text:ãã¯ã¹ ãã¯ãµã³ ãã¯ã¹ ãã¯ãµã³
in the console,ãÂÂã¯ã¹ ãÂÂã¯ãµã³ ãÂÂã¯ã¹ ãÂÂã¯ãµã³
in the database.
It seems that there's some sort of double-encoding going on, or perhaps the back-end was built to handle the specific format that's passed via the YUI2 Async request. I don't know.
Any ideas for what I should try next? What are the best practices?