0

I have a MySQL database with a table of latin1_swedish_ci collation that has a text field which contains text such as %u013A,Ok%20bro and %uD83D%uDE02%uD83D%uDE02. I understand that these data contains special characters such as spaces and emojis. How can properly display these texts in PHP? I have tried with the numerous PHP functions such as url_decode() and utf8_encode() to no avail. Please help.

Edit: The table is being populated from a node js server that applies the function escape() before inserting to the table. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/escape

Kevin
  • 90
  • 6
  • Does this answer your question? [How to convert latin1\_swedish\_ci data into utf8\_general\_ci?](https://stackoverflow.com/questions/12756877/how-to-convert-latin1-swedish-ci-data-into-utf8-general-ci) – MertDalbudak Mar 20 '21 at 14:48
  • @MertDalbudak Nope. – Kevin Mar 20 '21 at 15:03

1 Answers1

1

I recognize %20 as a space using in URL queries. You would need rawurldecode() to decode that.

I did find an answer to decode the other characters here and with a bit of extra code I managed to decode yours:

function decodeEmoticons($src) {
    $replaced = preg_replace("/\\\\u([0-9A-F]{1,4})/i", "&#x$1;", $src);
    $result = mb_convert_encoding($replaced, "UTF-16", "HTML-ENTITIES");
    $result = mb_convert_encoding($result, 'utf-8', 'utf-16');
    return $result;
}
$r = "%uD83D%uDE02%uD83D%uDE02";
echo decodeEmoticons(str_replace('%','\\',strtolower($r)));

This returns "".

So perhaps you need two decodings? First the rawurldecode() followed by the code above. Some experimenting is needed, which I cannot do because you didn't give a lot of text to work with.

The best way to go about this is to find out how the text was encoded.

KIKO Software
  • 15,283
  • 3
  • 18
  • 33
  • This works! A visual inspection of applying `rawurldecode()` followed by `decodeEmoticons()` seems fine on all records of the table. From what I understand, the table is being populated from a `node js` server that applies the function `escape()` before inserting to the table. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/escape – Kevin Mar 20 '21 at 15:08
  • In that case the best thing to use would be the [querystring.unescape(str)](https://nodejs.org/api/querystring.html#querystring_querystring_unescape_str) from Node JS, but I can understand that this is not possible. Using something else always runs the risk of not always decoding it correctly. – KIKO Software Mar 20 '21 at 15:32