Just now, I ran into a problem that I by mere chance had not encountered before:
In order to support emoji's in specific columns, I had decided to set my mysqli_set_charset()
to utf8_mb4
and a few columns encoding within my database as well.
Now, I ran into the problem of PHP actually not correctly handling accented characters coming from normal utf8
encoded fields.
Now, I'm stuck with having mixed utf8
and utf8mb4
results. Since my data-handling is not very strong (used to work frameworks that handled it all for me) I'm quite unfamiliar with how I could best resolve this.
I have thought about the following options:
1 ) set my entire database to utf8mb4
collation instead of utf8
with a few exceptions.
2 ) use mysqli_set_charset()
to change it, and simply make sure the queries getting said data are seperated
Now, neither of these seem like great ideas to me, but I can't really think of any better solution.
So then there's the remaining questions:
- Will setting my entire db to
utf8mb4
instead ofutf8
be a big performance change? I do realise thatutf8mb4
is bigger and therefore slower, which is why I tried to only use it on the columns in question in the first place. - Is there a way for me to simply have PHP handle
utf8
encoding well, even when themysqli_charset
is onutf8mb4
? - Do you have a better idea?
I'm at a real loss on this subject and I honestly can't guess which option is best. Googling on it didn't help too much as it only returned links explaining the differences of it or on how to convert your database to utf8mb4
, so I would very much love to hear the thoughts on this of one of the wise SO colleagues!
Columns in this specific case:
My response including PHP's character encoding detection:
arri�n = UTF-8
bolsward = ASCII
go�nga = UTF-8
lo�nga = UTF-8
echt = ASCII
echteld = ASCII
echten (drenthe) = ASCII
echten (friesland) = ASCII
echtenerbrug = ASCII
echterbosch = ASCII
My MYSQLI charset:
mysqli_set_charset($this->getConn(), "utf8mb4");
-- and I just realised the problem was with my mysqli_set_charset
. there indeed used to be an underscore in there...