0

I have a Mysql database having data below:

Thermometer -10℃ - 110℃

I copied that degree symbol from other website.

I did declare on my html like some example here:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

But it just wont show up when I call this data using php to my website, it only display

Thermometer -10?C - 110?C

What else do I need to do??

Newbee123
  • 61
  • 6
  • Try setting a different font in CSS – Adder Oct 29 '18 at 12:08
  • 2
    Declaring that you use UTF-8 in the HTML source is possibly the most irrelevant bit. You need to ensure you're actually using UTF-8 everywhere (text editor, database table, database connection...). Also, `℃` is encoded as `E2 84 83` in UTF-8. Display the raw bytes and verify at which point it gets broken. – Álvaro González Oct 29 '18 at 12:11
  • 1
    @Adder I don't think that's now how browsers normally work. If a glyph is unavailable in a given font you get a replacement character automatically. – Álvaro González Oct 29 '18 at 12:12
  • Álvaro González I use utf8_general_ci in the table. – Newbee123 Oct 29 '18 at 12:14
  • what do u mean 'Display the raw bytes and verify '? How to do it? – Newbee123 Oct 29 '18 at 12:15
  • My editor default encode in utf-8 too.. – Newbee123 Oct 29 '18 at 12:18
  • @ÁlvaroGonzález My reasoning is that `℃` is actually a single character. If it shows up as `?C` then the unicode should be correct and the bug should be elsewhere – Adder Oct 29 '18 at 12:20
  • It was display correctly in my mysql table. But when I CRT V it in here with after 4 space, it become like that, I think the editor in stack overflow has its own way of encoding too... – Newbee123 Oct 29 '18 at 12:21
  • @ÁlvaroGonzález it did show up up as ?C. So where else can be faulty?? – Newbee123 Oct 29 '18 at 12:24
  • I mean stuff like [HEX()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex), [bin2hex()](http://php.net/bin2hex), [hexadecimal editors](http://hexed.it), etc. – Álvaro González Oct 29 '18 at 13:50
  • 1
    Fonts and CSS are unlikely to have any impact here. – Rick James Oct 29 '18 at 16:44
  • `COLLATION utf8_general_ci` implies `CHARACTER SET utf8` which should have both E28483 and C2B0. – Rick James Oct 29 '18 at 16:51
  • [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) – deceze Oct 30 '18 at 05:21

1 Answers1

2

I'm hearing that the single Unicode character (hex E28483) is turning into the two Unicode characters °C (hex C2B0 43). Let's verify this. If encoded as latin1, the latter hex will be B0 43.

If the character(s) are in the database, then do

SELECT col, HEX(col) FROM ...

If they are in PHP, then do

echo bin2hex($str);

Then report back which hex you get.

This discusses why the two character version could turn into ?C. It suggests:

  • The bytes to be stored are not encoded as utf8/utf8mb4. In particular, hex B0 is the latin1 encoding for °.
  • The column in the database is not CHARACTER SET utf8 (or utf8mb4). Fix this. (Use SHOW CREATE TABLE.)
  • Also, check that the connection during reading is UTF-8.

Meanwhile, there is nothing (that I know of) in either MySQL or PHP that would turn the one-char encoding into the two-char version. Is there any other process involved?

In the Unicode specification, there is a "Decomposition" of the 1-char version into the 2-char version, but I don't know what product would make use of such. Another example: Lj vs Lj

Who is converting?

If MySQL were converting from utf8 to latin1, I would expect

CONVERT(CONVERT(BINARY('℃') USING utf8) USING latin1)

to return the two-char version. But, no, it returns '?'. I have to assume some other process that the data is going through is being kind enough to convert the 1-char thing into 2-chars, perhaps then converting to latin1 (which is almost identical to cp1252 and ISO-8859-1 and ISO-8859-15)

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I run HEX(col) it does return (C2B0 43) for °C. – Newbee123 Oct 30 '18 at 02:10
  • When I run -sql to mysql: show variables like "collation_%", it return that collation_connection is utf8mb4_bin, collation_database is utf8_unicode_ci, and collation_server is ** latin1_swedish_c**. Can this **collation_server latin1_swedish_c** be the problem?? – Newbee123 Oct 30 '18 at 02:13
  • @Newbee123 - Look for `"char%"` instead of `"collation%"`. The hex indicates that the conversion happened _before_ `INSERTing` into the database. What action happened then? – Rick James Oct 30 '18 at 03:29
  • sql to mysql: show variables like "character%", it return: Variable_name: Value character_set_client: utf8mb4, character_set_connection: utf8mb4, character_set_database: latin1, character_set_filesystem: binary, character_set_results: utf8mb4, character_set_server: latin1, character_set_system: utf8, character_sets_dir: /usr/share/mysql/charsets – Newbee123 Oct 30 '18 at 04:46
  • can it be **character_set_server: latin1 that cause the problem?? – Newbee123 Oct 30 '18 at 04:52
  • @Newbee123 - latin1 has the two characters but not the single character. MySQL will do 1:1 conversions, but not the conversion in question. What other process is involved? (I added to my Answer.) – Rick James Oct 30 '18 at 17:01
  • @Newbee123 - and... The `character_set_client/connection/results = utf8mb4` declare that the _client's_ bytes are encoded as `UTF-8`. I am pretty sure that if you have the 1-char `℃` in UTF-8 in the client, and store it into a column that is `CHARACTER SET latin1`, it will be turned into `?`. (That is, I still don't have the full answer.) – Rick James Oct 30 '18 at 17:05