0

i'm building an application which had to handle englisch and german texts before. Now i'm adding the ability to translate texts to russian and chinese too. But it seems that cyrillic and chinese characters cant be saved in the default lating1 charset. I switched my charset to utf8_general_ci using PHPMyAdmin. It works finde, and i can save new contents with every needed character.

The problem ist, that every old Umlauts like ä,ö,ü are replaced with ?. Its wired because if i enter the umlauts again to save it to the database it works correctly. So the change of the charset seems to transform every umlaut to a "?".

Can someone point me in a direction where i can change the charset but dont break all old content?

Thanks!

jDoe
  • 113
  • 5

1 Answers1

0

There are many ways to "change" the character set being used. Each works for one situation, but makes things worse for other situations.

Unfortunately, you are at the worst of all situations -- the data is gone, and replaced by "?".

If you can reload the data, then use this to analyze the situation and this to identify the appropriate fix.

Since you will be wanting to store Chinese, you must end up with CHARACTER SET utf8mb4, not just utf8. (English, Cyrillic, and German are included.)

If you wish more specific information, I need more details: SHOW CREATE TABLE, SELECT HEX(..) from first link, etc.

Some charset changes require changing the bits. Using ä, for example:

                  Character set(s)         HEX encoding

                            utf8mb4, utf8  C3A4
                 cp1250, cp1257, dec8,
           latin1, latin2, latin5, latin7  E4
                    cp850, cp852, keybcs2  84
                            eucjpms, ujis  8FABA3
                                  gb18030  81308A31
                                      hp8  CC
                          macce, macroman  8A
                                     swe7  7B

English letters do not change between most character sets. Accented letters -- a mixture.

  • utf8 is a subset of utf8mb4
  • Most "latin*" charsets have some overlap or are missing characters.
  • Ascii does not include any accented letters, so conversion to it will probably produce "?".
Rick James
  • 135,179
  • 13
  • 127
  • 222