I have a database which has the latin1 default characterset - info obtained by running the following statement:
SELECT default_character_set_name FROM information_schema.SCHEMATA
WHERE schema_name = "schemaname";
The default character set for each table and column in this database is set to utf8.
When I look at the data in the tables I can see data is stored as utf8 e.g the currency symbol €
is stored in the table as €
. Similarly apostraphes are stored as ’
etc.
On the web frontend I have the following meta tag and so the characters render correctly.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
However I'm also seeing a lot of � symbols on the webpage which I don't see inside the database?
When I change the database connection to include the charset utf8 as follows: mysql:host=myhost;dbname=mydatabase;charset=utf8
, the diamond symbols disappear but then all the other utf8
characters revert to exactly how they are saved in the database e.g. the €
symbol renders as €
on the webpage?
Why is this happening?
- How do I fix this and also change character set to utf8mb4?
Any help appreciated.
* UPDATE *
Tried the following steps:
for the database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Not sure if Step 3 is necessary since when I do SHOW CREATE TABLE
after step 2, whilst the definition doesn't display the column charset it does display the default charset for the table as utf8mb4. As a sanity check I did run step 3 on one of the tables columns but it makes no difference - €
is being rendered on the page as €
with db connection as follows:
`mysql:host=myhost;dbname=mydatabase;charset=utf8mb4`
I had to run the following on each column I wanted converting which seems to fix some issues
UPDATE tbl_profiles SET profile =
convert(cast(convert(profile using latin1) as binary) using UTF8MB4);
but still seeing characters such as Iâm
and «Â
and â¢
rendered on the webpage
Any ideas?
* UPDATE 2 *
After running steps 1 and 2 above I have a table column as follows:
`job_salary` VARCHAR(150) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
The following query on this column returns the following result:
SELECT job_salary FROM tbl_jobs WHERE job_id = 2235;
€30,000 plus excellent benefits
I execute the following statement on this column:
UPDATE tbl_jobs SET job_salary = CONVERT(BINARY(CONVERT(job_salary USING latin1)) USING utf8mb4);
But I get the following error which means some other record has a invalid utf8mb4
Invalid utf8mb4 character string: '\x8010000 to \x8020000 Per: annum'