1

I have a web service which inserts various user input into a MySQL DB.

My DB insert statement uses $db->real_escape_string() for security on all input.

However this seems to be stripping out emoji references from input, e.g. in the format \ud83d\ude2e (winking face).

The pertinent DB column has collation utf8_general_ci, as does the table.

Example:

  • Before insert: "Foo bar! \ud83d\ude2e"
  • After insert (via real_escape_string()): "Foo bar! "

Perhaps real_escape_string() is supposed to pull out things like this, for all I know, and so my problem may be obvious to people more knowledgeable in this field.

How can I allow the retention of the emoji references in the DB insert?

Mitya
  • 33,629
  • 9
  • 60
  • 107
  • Well, you should use prepared statements instead for security/safety... but, you should also let us know what the collation and charset is for your table fields and database. – IncredibleHat Mar 03 '18 at 14:35
  • 1
    https://mathiasbynens.be/notes/mysql-utf8mb4 – CBroe Mar 03 '18 at 14:36
  • 1
    This is somewhat legacy code. I'm doing plenty of security checks before it gets anywhere near the insert stage. Updated question with collation info. – Mitya Mar 03 '18 at 14:36
  • 1
    Thanks for the link - shame I didn't find that (or the duplicate question) in my searching beforehand. – Mitya Mar 03 '18 at 14:37
  • 1
    Ok, legacy code, perfectly understandable... I've been there... still there in some cases. ;) – IncredibleHat Mar 03 '18 at 14:37
  • The linked answer recommends changing the collation from utf8_general_ci to utf8mb4, but I can't find any information on the potential ramifications of this for existing data. Can anyone comment? Would I expect adverse effects on any saved special characters in the database or anything like that? – Mitya Mar 03 '18 at 14:43
  • Answering my own question (previous comment). From the [MySQL docs](https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html): *"For a supplementary character, utf8 cannot store the character at all, whereas utf8mb4 requires four bytes to store it. Because utf8 cannot store the character at all, you have no supplementary characters in utf8 columns and need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL. "* – Mitya Mar 03 '18 at 14:48
  • Right @Utkanos ... upconverting to utf8mb4 should not need actual data conversions (not like going from iso to utf8 omg that was hell). But that article is old in that it does not mention about using the 520 collation. Also I think its a bit off about the varchar(191) issue... I thought that was fixed somewhere along the lines when they introduced 520. – IncredibleHat Mar 03 '18 at 15:12
  • @IncredibleHat - excellent, thanks. I'll do the conversion and hope for the best! – Mitya Mar 04 '18 at 12:16

0 Answers0