0

I have problem with MySQL chinesse marks from latin to utf-8 conversion.

I set all variables

ALTER DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

SET CHARACTER SET utf8mb4;
SET NAMES utf8mb4;

after that i execute script :

select id,convert(binary convert(`address` using latin1) using utf8mb4),address from users where id = 11111;

and i receive good results:

'11111', '一二 三四五', '一二 三四五'

so now i want to update this row so i execute:

UPDATE `users` SET `address` = convert(binary convert(`address` using latin1) using utf8mb4) where id = 11111;

but in output i get information:

1 row(s) affected, 1 warning(s): 1366 Incorrect string value: '\xE4\xB8\x80\xE4\xBA\x8C...' for column 'address' at row 1 Rows matched: 1  Changed: 1  Warnings: 1

and after that i select address for that user i get : '?????????'

Please help,

Regards

Juri Bojka
  • 305
  • 2
  • 8
  • 18
  • Latin-1 cannot encode Chinese at all. What are you trying to do? – Álvaro González Jan 30 '17 at 15:56
  • @ÁlvaroGonzález i want to convert latin1 to utf-8 and i have chinese marks so i want to convert it. But i dont know why select prints good results but update has warnings and add '????' . Regards – Juri Bojka Jan 30 '17 at 15:58
  • Likely the charset of the specific column is explicitly set to `latin1`. You'll have to convert the column's encoding. The `DATABASE` setting is just a default in case the table/column doesn't have one. – deceze Jan 30 '17 at 16:05
  • Sorry if I didn't explain myself properly. Those good results (`'11111', '一二 三四五', '一二 三四五'`) cannot be correct. Apart from just *looking* like gibberish, `四五` cannot be the contents of a properly encoded Latin-1 string because Latin-1 has a very limited character catalogue that does not have a single ideogram (see the [complete list](https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout)). Are you trying to fix data that's incorrectly encoded? – Álvaro González Jan 30 '17 at 16:07
  • @ÁlvaroGonzález so how can i convert it to utf8 when latin1 is '一二 三四五' ?? – Juri Bojka Jan 30 '17 at 16:12
  • See Mojibake and Question marks in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Jan 30 '17 at 23:31

1 Answers1

2

The most likely explanation for the behavior you observe (just guessing here) is the characterset of the column address.

The column definition includes the characterset for the column. If a characterset wasn't specified when the column was created, it would have been set to the default characterset for the table. And if a default characterset of the table wasn't specified when the table was created, it would have been set to the default characterset of the database.

The ALTER DATABASE mydb CHARACTER SET statement does not have any effect on tables and columns that are already created. The specified characterset will be applied to any newly created tables where the default characterset is not specified.


We can do a SHOW CREATE TABLE users to see the characterset of the address column.

It's not at all clear what you are attempting to achieve.

spencer7593
  • 106,611
  • 15
  • 112
  • 140