1

I'm attempting to send a string from client-side JavaScript to a back-end PHP script. The string contains special quotes like and .

When I look at the console in Chrome I can see that these are sent in the POST headers as they are. On the PHP side I then immediately json_encode() the $_POST array and send it back to see what its collected. The special characters now look like this \u2019. This is for testing please note I would normally sanitize all post data.

I wish to use UTF-8 but I'm not sure what I'm missing. My HTML includes:

<meta charset="utf-8">

My PHP server has UTF-8 set as its default charset.

If I start saving such data to the database I start ending up with strings like this: â for . However this is not a database issue the characters are already bad before going into the database. MySQL purely accentuates them.

Any ideas?

Update

I've noticed that if I return the string back to javascript without using json_encode() then it's in its original format with the special quotes ( and ) still.

diggersworld
  • 12,770
  • 24
  • 84
  • 119
  • `` instead of what you have for the meta tag? – Jon Mar 08 '13 at 16:32
  • That's just the correct output, UTF-8 characters should be encoded like `\uXXXX` in JSON. @Jon: That's the same, only in HTML 5 syntax. BTW, don't sanitize your POST data, use the correct escaping/encoding when outputting/using the data. – Marcel Korpel Mar 08 '13 at 16:32
  • It wasn't specified, so I assume XHTML or HTML 4.x ^^ But @MarcelKorpel is correct with that being correct return after a `json_encode`. – Jon Mar 08 '13 at 16:35
  • possible duplicate of [Strange Characters in database text: Ã, Ã, ¢, â‚ €,](http://stackoverflow.com/questions/7861358/strange-characters-in-database-text-a-a-a) – AlexV Mar 08 '13 at 16:45
  • Your database problem seems more like an encoding issue. Do you use UTF-8 as collation? – Marcel Korpel Mar 08 '13 at 16:45
  • See my answers for this duplicate: [Strange Characters in database text: Ã, Ã, ¢, â‚ €,](http://stackoverflow.com/questions/7861358/strange-characters-in-database-text-a-a-a/7889628#7889628) & [problem with special characters](http://stackoverflow.com/questions/3881911/problem-with-special-characters/3882072#3882072) – AlexV Mar 08 '13 at 16:47
  • Database charset is set to UTF-8. I have looked at the point where I pass the string to the database via a Java backend. The string at that point already contains bad characters. So the is issue prior to database entry. – diggersworld Mar 08 '13 at 16:53
  • How do you know the characters are already bad before going into the database? – Marcel Korpel Mar 08 '13 at 17:29
  • @MarcelKorpel because when debugging the Java back-end I can see all incoming POST data. It is at that point I can see the string containing invalid characters (before it gets to the DB). – diggersworld Mar 08 '13 at 17:34
  • But it is correct when you directly output the POST data to the browser? Strange. Is there something wrong when the data goes from PHP to Java? – Marcel Korpel Mar 08 '13 at 17:36
  • I suppose there's a possibility the Java may receive it incorrectly. – diggersworld Mar 08 '13 at 17:38
  • Can't you directly access your database from within PHP? – Marcel Korpel Mar 08 '13 at 17:41
  • No, the system is setup in such a way that there is a Java API layer responsible for setting and getting data. Then there's a PHP layer which is responsible for the font-end. – diggersworld Mar 08 '13 at 17:47

2 Answers2

0

Have you tried:

utf8_decode()

On the server side for the variables you're passing? PHP is likely expecting iso-8859-1 rather than uft-8.

Tom Walters
  • 15,366
  • 7
  • 57
  • 74
  • That appears to change the values to `?` characters. – diggersworld Mar 08 '13 at 16:29
  • Curious, just found [this question](http://stackoverflow.com/questions/6616240/corrupted-characters-when-jquery-ajax-sends-to-php-utf-8-iso-8859-incompatibi) along the same lines. – Tom Walters Mar 08 '13 at 16:30
0

Turns out there was an issue both sides of the pond. The issue PHP side which this question regards was that the data was being sent to the back-end via a GET request (url encoded). I have changed this to a POST request.

This has allowed me to specify the UTF-8 charset when sending the headers for the POST request.

diggersworld
  • 12,770
  • 24
  • 84
  • 119