I'm getting errors when certain characters are being added to a table... even when the column is has utf8mb4
character set. For example:
SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xE0' for column 'surname'
The data in question is: SEGAL‡ (note the double dagger)
Is this character beyond even 4 byte UTF8 or is the collation causing the issue? Or is it something else?
Screenshot showing character set and collation of the column:
It's a Laravel 8 app and the MySQL connection is configured to the following:
'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
Looking at the CSV file in PHPStorm, non-ASCII characters are displayed as �
. I've tried explicitly setting the file encoding to UTF-8 in PHPStorm (with and without BOM).
If I open the CSV in Excel then the non-ASCII characters display correctly. Confused.
Update
Examining the CSV in a HEX editor shows that a character like ä
is stored as a single byte (8A
). When this CSV is opened in Excel it correctly shows ä
, but in everything else it shows �
.
I don't know what character encoding Excel is using, as this character should be typically stored as E4
when using a single byte, or C3 A4
in UTF-8.