2

I'm using JQuery to serialize and post form values to a PHP/MySQL server, and the form textarea contains text and emojis as HTML entities:

<textarea>Hello &#x1f600; and goodbye</textarea>

The server receives:

Hello \ud83d\ude00 and goodbye

However, only "Hello" gets stored in the database correctly. The emoji/entity and any text after it disappear. The database gets:

Hello

Everything is UTF-8 throughout.

What's the right way to parse this into something that can be stored and then returned back into HTML in a way that it renders properly? I must be overlooking something simple.

Tom
  • 30,090
  • 27
  • 90
  • 124
  • Try encoding as `utf8mb4` – Andy Foster Feb 20 '18 at 13:48
  • @AndyFoster You mean at database level? – Tom Feb 20 '18 at 14:02
  • In PHP are you using PDO with parameterized queries? Can we see that code? – Matt S Feb 20 '18 at 14:13
  • @MattS It's nothing more complicated than INSERT ... VALUES ( ' ".$escaped." ') – Tom Feb 20 '18 at 14:20
  • What's the escaped value? My guess is it's getting to Mysql as an invalid character or escape sequence. – Matt S Feb 20 '18 at 14:23
  • @MattS The escaped value appears to be exactly what reaches the server, with the escaping doing nothing to it. I'm trying to manipulate the string back into HTML entities prior to saving (which I haven't managed yet), but I'm also aware that there's probably a better solution. – Tom Feb 20 '18 at 14:35
  • 1
    @Tom yes. SQL will truncate the insert at the first 4-byte unicode character. I referred to [this blog](https://blog.arkency.com/2015/05/how-to-store-emoji-in-a-rails-app-with-a-mysql-database/) – Andy Foster Feb 21 '18 at 15:00

1 Answers1

1

Solved.

Either change database/table collation, as per Andy Foster in the comments below the question, or JSON encode/decode to ensure it gets saved properly as a string.

json_encode($str);
json_decode($str);

In my case, there were three things affecting encoding/rendering:

1) Initial submit using JQuery serialize() which converted the emoji into a Unicode escape sequence.

2) Saving the Unicode escape sequence into database. I used JSON encoding to get past this.

3) Output escaping when rendering the JSON-decoded emoji string back into an HTML template.

Tom
  • 30,090
  • 27
  • 90
  • 124
  • Why would you remove all the code from your answer? – GrumpyCrouton Feb 21 '18 at 17:10
  • @GrumpyCrouton After spending a bit more time with this, I realise the code I posted is specific to a purpose but not necessarily useful for every scenario. I've updated the answer with a bit more information. – Tom Feb 21 '18 at 21:36
  • When using `json_encode`, add `JSON_UNESCAPED_UNICODE` as the second argument. – Rick James Feb 22 '18 at 01:12
  • @RickJames Funnily enough, using JSON_UNESCAPED_UNICODE breaks it again and the string doesn't get saved properly. – Tom Feb 22 '18 at 11:30
  • @Tom - Then that needs solving. See "best practice" in [_here_](https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored) – Rick James Feb 22 '18 at 12:49