0

I am storing an emoji as part of a string in a text field in MySQL:

<div><span id="emoji_1f600">&#x1f600</span></div>

The field in MySQL has utf8_general_ci set. When the data is stored into MySQL the field, the data now looks like this:

<div><span id="emoji_1f600">😀</span></div>

I am assuming that is because of how the emoji is stored. Please educate me if I am wrong on this point, as I thought I would have seen the unicode of &#x1f600 instead of the strange characters.

I then fetch the data from the MySQL field into a php var and do a substring to get just the actual emoji between the span tags. The value in the php var now looks like this:

"C0E8Kb,"

My code makes an attempt to get the unicode back by doing the following:

$code = utf8_encode($code) //$code contains the string "C0E8KB,"

The result is "CB0CB8CBC"BB,"

I am obviously not handling the emoji utf8 code properly and welcome any and all help and instruction.

Thanks in advance.

I don't really need UTF8 all the way through. Just on one field. Which the field in MySOL is typed to be utf8.

Ok I made a major mistake in my problem description. It is true that my code is producing the following html

<div><span id="emoji_1f600">&#x1f600</span></div>

However, this html is within an editor from a 3rd party and the emoji code within my span tag is actually being rendered as an emoji. So when I save the data from the editor, what I get back from the editor is the following:

<div>test 2 <span id="emoji_1f600">😀</span></div>

I am assuming the strange chars between the span tags is the actual emoji, since it is being rendered. Is this ok as is, or should I be replacing that with the actual &#x1f600 code, prior to storing it in the database? My fear is that if I do that, then the actual emoji will not get rendered when I place the string from the database into an html string to be rendered.

Tim
  • 273
  • 2
  • 10
  • 2
    https://stackoverflow.com/questions/279170/utf-8-all-the-way-through – arkascha Jul 14 '17 at 14:01
  • Possible duplicate of [UTF-8 all the way through](https://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – user3942918 Jul 14 '17 at 14:18
  • thank you for the information. I understand fully, but this is not the path I want to pursue. So could you possibly tell me the best mechanism for or technique for embedding an emoji in a string without requiring the a total conversion to utf8? – Tim Jul 15 '17 at 01:47

2 Answers2

0

utf8_encode should not be used as your DB is already UTF-8 ; it encodes from ISO-8859-1 (often found with MySQL) to UTF-8 ; it may produce bad chars if your data is already utf-8 encoded. Is the html page containing the data that you want to store declared as utf-8 ? Something like this :
<head> <meta charset="UTF-8"> </head>

I was bored so I tried the following code with no issue :

`<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
    <title></title>
</head>
<body>
<div><span id="emoji_1f600">&#x1f600</span></div>
<?php 
$mysqli=new mysqli("127.0.0.1", "root", "","utf8_general_mysql");
$num=1;
$text="&#x1f600";
$stmt = $mysqli->prepare("INSERT INTO testtable VALUES (?, ?)");
$stmt->bind_param('ds', $num, $text);
$stmt->execute();
echo '<div><span id="emoji_1f600">&#x1f600</span></div>';
$stmt = $mysqli->prepare("SELECT * FROM testtable WHERE testtable.text='&#x1f600'");
$stmt->execute();
$result = $stmt->get_result();
while ($row = $result->fetch_array(MYSQLI_NUM))
        {
            foreach ($row as $r)
            {
                print "$r ";
            }
            print "\n";
        }

?>
</body>
</html>`

Edit ... :

I really think it has to do with your headers content-type :

try to add : header('Content-type: text/html; charset=utf-8'); then try header('Content-type: text/html; charset=iso-8859-1'); (this is how you seem to be set) on the page you are inserting data to MySQL, here are the 2 different rows : enter image description here

I think meta charset does not work because http headers can be set elsewhere, these PHP lines should do the trick, hopefully.

To have these rows, i had to set the headers and replace the previous $text value with $text="" into my code sample.

knrf
  • 89
  • 6
  • Thank you you have helped me better understand the problem. I, however, have made an error when describing the problem that maybe you can still help with. Please see my updated problem description. I genuinely appreciate your help! – Tim Jul 15 '17 at 12:31
  • Could you describe the way you store it in the db ? I think you should store it as it is, "😀", if possible but i'm not sure what is restraining you from doing that; if you just render it as html, your browser should be able to display the emoji by itself. – knrf Jul 15 '17 at 13:42
  • Yes that is what I want to do. However, the editor (as mentioned in my last comments in the problem description) is handing me back the 😀. How do I convert that back to the 😀 using php? If I could do that then my problem would be solved. – Tim Jul 17 '17 at 12:07
  • I should add that I believe the 😀 string is the actual emoji, so there may be now way to get the code back. I just don't know. Appreciate the continued help. – Tim Jul 17 '17 at 12:34
  • Also, on the js side, I see the emoji. However, once the string is serialized and passed to the php code, the post var holding the string containing the emoji is now what has the 😀 within the string. – Tim Jul 17 '17 at 13:46
0

Your problem is assuming that MySQL's characterset called utf8 is actually utf8. It isn't. MySQLs utf8 is a 3-bytes subset of utf8 that does not cover emojis. In order to tell MySQL to not corrupt your data in the future, and give an error instead when invalid characters are given for the row, enable the STRICT_TRANS_TABLES sql_mode. In order to make mysql use the real 4-byte utf8, make the row characterset "utf8mb4" - in short, mysqls utf8 is a retardedly named utf8 subset, and the real utf8 is called utf8mb4 in MySQL. (This is also true for MariaDB btw, which inherited this brain damage from the MySQL source code it was forked from)

hanshenrik
  • 19,904
  • 4
  • 43
  • 89