sql: converting from cp1251 to utf8

Question

How to compile a SQL query that converts the table column from cp1251 to utf8. My database and all the tables in it with the encoding of utf8, but the information inside on cp1251. Because of this information is displayed incorrectly. A simple change of the encoding in the settings turns them into other incomprehensible symbols. I tried to change DEFAULT CHARESET in the dump, but it did not help either :( Maybe there are other methods?

Part of the database:

I did so:

mysqldump -u mysql -p conference_db --allow-keywords --create-options --complete-insert --default-character-set=utf8 --add-drop-table > dump.sql
All ... DEFAULT CHARESET=utf8 changed to ... DEFAULT CHARESET=cp1251 (in CREATE TABLE)
mysql -u mysql -p conference_db --default-character-set=cp1251 < dump.sql

score 1 · Answer 1 · answered Feb 13 '18 at 22:15

It's hard to do this with dump and restore operations.

However, you can convert the character set of the offending column in a SELECT statement using the CONVERT operation: CONVERT(column USING utf8)

For example, if you have a new empty table with the correct character set on the column, this will copy the old table to the new and do the conversion.

 INSERT INTO new_table
 SELECT journal_id, locale, setting_name, 
        CONVERT(setting_value USING utf8) setting_value
        setting_type
   FROM table

Thx! But it does not help.=( The fact is that the column already has encoding utf8. But the text inside resembles cp1251. And every time I try to use transforming functions, they take this text for utf8. — Eugene Kotov, Feb 14 '18 at 02:29

score 0 · Answer 2 · answered Feb 17 '18 at 02:48

The default charset on a table does not matter. What matters is the charset on the string columns.

Back up to the original; I worry that the failed conversion has only made things worse, and more difficult to untangle.

The screenshot looks a lot like "Mojibake". (If you can paste the data in your Question, I can analyze it further.) See "Mojibake" in here for what mistakes cause it.

For help in recovering the data, please provide SHOW CREATE TABLE and SELECT HEX(col) ... so we can see what you have. ("column from cp1251" is ambiguous).

user2586441 · Answer 3 · 2022-07-11T12:53:56.653

An old post, but I had the same problem in 2022 and this thread was the first in google results.

This answer worked for me: How can I convert Cyrillic stored as LATIN1 ( sql ) to true UTF8 Cyrillic with iconv?

And if you stumble on iconv: illegal input sequence at position 113498 then you can replace invalid character(s) with command

printf '-' | dd bs=1 seek=113498 conv=notrunc of=<your file with errors>

this will put hyphen instead of invalid character.

Or you can ignore invalid characters by adding "-c" to iconv:

iconv -c -f utf-8 -t latin1 < input.sql | iconv -f cp1251 -t utf-8 > output.sql

sql: converting from cp1251 to utf8

3 Answers3