2

Just now, I ran into a problem that I by mere chance had not encountered before:

In order to support emoji's in specific columns, I had decided to set my mysqli_set_charset() to utf8_mb4 and a few columns encoding within my database as well.

Now, I ran into the problem of PHP actually not correctly handling accented characters coming from normal utf8 encoded fields.

Now, I'm stuck with having mixed utf8 and utf8mb4 results. Since my data-handling is not very strong (used to work frameworks that handled it all for me) I'm quite unfamiliar with how I could best resolve this.

I have thought about the following options:

1 ) set my entire database to utf8mb4 collation instead of utf8 with a few exceptions.

2 ) use mysqli_set_charset() to change it, and simply make sure the queries getting said data are seperated

Now, neither of these seem like great ideas to me, but I can't really think of any better solution.

So then there's the remaining questions:

  • Will setting my entire db to utf8mb4 instead of utf8 be a big performance change? I do realise that utf8mb4 is bigger and therefore slower, which is why I tried to only use it on the columns in question in the first place.
  • Is there a way for me to simply have PHP handle utf8 encoding well, even when the mysqli_charset is onutf8mb4?
  • Do you have a better idea?

I'm at a real loss on this subject and I honestly can't guess which option is best. Googling on it didn't help too much as it only returned links explaining the differences of it or on how to convert your database to utf8mb4, so I would very much love to hear the thoughts on this of one of the wise SO colleagues!

Columns in this specific case:

enter image description here

My response including PHP's character encoding detection:

arri�n = UTF-8
bolsward = ASCII
go�nga = UTF-8
lo�nga = UTF-8
echt = ASCII
echteld = ASCII
echten (drenthe) = ASCII
echten (friesland) = ASCII
echtenerbrug = ASCII
echterbosch = ASCII

My MYSQLI charset: mysqli_set_charset($this->getConn(), "utf8mb4");

-- and I just realised the problem was with my mysqli_set_charset. there indeed used to be an underscore in there...

NoobishPro
  • 2,539
  • 1
  • 12
  • 23

1 Answers1

4

It is spelled utf8mb4 (no underscore).

See Trouble with utf8 characters; what I see is not what I stored . In particular, read "Overview of what you should do" in the answer.

You do not need to change the entire db. It is fine to specify utf8mb4 for only selected columns.

You do need to use utf8mb4 for the connection, but you specify 'UTF-8', which is the outside world's equivalent of MySQL's utf8mb4. MySQL's utf8 is a subset of utf8mb4. (Note: I am being precise in use of hyphens and underscores.)

utf8mb4 is not bigger, nor slower for transferring characters that are in common between utf8mb4 and the utf8 subset. Emoji are 4 bytes, so they are bigger than most other characters, but you are stuck with them being 4 bytes; don't sweat it.

Community
  • 1
  • 1
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I have read your answer on the other question and my problem is specifically with the black diamonds. Now, I have done pretty much everything you said right in the first place. My meta tags, php headers, pretty much every charset I have is set to `UTF-8`, exactly as written here. I retrieved the charset per string and got `UTF-8` back for all the strings with special characters. (`ASCII` for non-special character strings). The only thing I can think of now is that I input the data directly into the database from PHPMyadmin (through a query, though). Might that be it? I'm so lost right now. – NoobishPro Oct 08 '16 at 16:53
  • Yeeeeeep, fixed it thanks to your answer on the other post, in the end. It sunk in quite late that on `mysqli_set_charset()`, I have to specifically use `utf8` or `utf8mb4`, and not `UTF-8`. Also, I used to have the naming PHPMyAdmin used, which was obviously very stupid. (I got the underscored spelling from there, as well!) Thank you so much! You deserve a beer, my man! – NoobishPro Oct 08 '16 at 17:01
  • Thanks for the praise. I have spent years distilling my advice down to that self-answered Question. Are you saying I have a spelling error in that Q&A (or this one)? If so, I should fix it. – Rick James Oct 08 '16 at 17:20
  • (I assume you spotted my advice on the "black diamonds" you presented in your last update.) – Rick James Oct 08 '16 at 17:39
  • No, you most definitely do not have a spelling error. It just wasn't entirely clear to me (at first) that I had to specifically use `utf8` or `utf8mb4` in the `mysqli_set_charset();` function. I tried it with `UTF-8` the first time after I read both your answers, thinking it was still a PHP thing. – NoobishPro Oct 08 '16 at 18:12