0

I am having a MySQL table where some rows are correctly UTF8, some are inserted as ISO. I need to convert them into UTF8.

Eg.

Hahnemühle
Hahnemühle

How can I convert only rows from ISO-8859-1 to UTF8?

Here's what I have tried. I'm checking if iso-to-utf gives valid result:

# create table
create table tmp1 (
    txt varchar(100)
);

# insert data
insert into tmp1 values ('Hahnemühle');
insert into tmp1 values ('Hahnemühle');

# convert
update tmp1
set txt=convert(cast(convert(txt using latin1) as binary) using utf8)
where convert(cast(convert(txt using latin1) as binary) using utf8) is not null;

It gives me: [HY000][1300] Invalid utf8 character string: '...'

Which is weird because result of:

select txt
from tmp1
where convert(cast(convert(txt using latin1) as binary) using utf8) is not null

is:

Hahnemühle
JohnSmith
  • 436
  • 5
  • 17
  • You may be interested to learn that MySQL UTF-8 is not, in fact, UTF-8, but is only a *subset* of UTF-8 characters. You could possibly want to use `utf8mb4_` collation instead. [read here for more information](http://stackoverflow.com/a/279279/3536236) – Martin Dec 08 '16 at 13:53
  • Whoa, interesting fact. But `convert(cast(convert(txt using latin1) as binary) using utf8mb4)` still does not help – JohnSmith Dec 08 '16 at 14:05
  • That's why I didn't put it as an answer, but I would suspect you want to translate to `utf8mb4_` rather than `utf8_` – Martin Dec 08 '16 at 15:38
  • Study http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored/38363567 , especially the discussions about Mojibake. Come back if you need more help. – Rick James Dec 09 '16 at 01:14
  • `utf8mb4` is better, but it is not required for any European languages; `utf8` will suffice. (I'm saying that any discussion of utf8mb4 vs utf8 here will be a 'red herring'.) – Rick James Dec 09 '16 at 01:15

0 Answers0