0

I have a "keyword" column that contains German, Spanish, French specific characters but appear incorrectly. How can I rectify it?

E.g. fähre appears whereas I need it to show fähre

When checking the CHARACTER_SET_NAME and COLLATION_NAME I get utf8mb4 and utf8mb4_general_ci respectively.

I've tried:

  • ALTER TABLE gsSearchAnalyticsTest CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
  • ALTER TABLE gsSearchAnalyticsTest CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  • CONVERT(CAST(keyword as BINARY) USING utf8mb4) as keyword_new
  • CONVERT(keyword USING utf8mb4) as keyword_new
  • Creating a new table and using CHARACTER SET utf8mb4 and COLLATE utf8mb4_unicode_ci (this gives me a Specified key was too long; max key length is 767 bytes error even though biggest VARCHAR is VARCHAR(255))

The only thing that works is this: REPLACE(keyword, 'ä', 'ä') as keyword_new but surely there is an easier way rather than doing a replace for all these type of characters?

If it's too much work in MySQL, then open to solutions using Python where efficient?

AK91
  • 671
  • 2
  • 13
  • 35
  • 1
    Obviously if the character set is already utf8mb4, your `ALTER TABLE` statements would have no effect. And collation simply refers to how text is sorted, not displayed. Probably the data was stored incorrectly and your best bet would be to re-insert it correctly. How are you accessing the data? – miken32 Oct 04 '19 at 15:36
  • `fähre` is what I would excpect a UTF8 string to look like if it was converted to a *single-byte codepage. That's exactly what `fähre` looks like if it gets converted to ASCII. The data looks OK. The problem is how it's displayed – Panagiotis Kanavos Oct 04 '19 at 15:45
  • @miken32 - accessing via sql editor. @PanagiotisKanavos - what to do in order to get it to display `fähre` and not `fähre`? Convert to ASCII? – AK91 Oct 04 '19 at 15:55
  • I don't know what that is. Try accessing from the `mysql` command line, this will give you a better idea of how the data is stored. (Assuming you have a UTF-8 terminal encoding!) – miken32 Oct 04 '19 at 15:56

0 Answers0