1

The data is the names of country subdivisions. Some have been stored as utf8 and some are not. example, this is how they are in my table:

statename

Bocas del Toro
Chiriquí
Coclé
Colón
Darién
Veraguas
Panamá Oeste
Emberá
Kuna Yala
Ngöbe-Buglé

This question/answer gets me really close to a solution: How to fix double-encoded UTF8 characters (in an utf-8 table)

If I use: CONVERT(CAST(CONVERT(statename USING latin1) AS BINARY) USING utf8):

statename

Bocas del Toro
Chiriquí
Coclé
Col
Dari
Veraguas
Panam
Emberá
Kuna Yala
Ng

the characters stored as "é" for example, just end the string.

the variation provided in that answer ,

SELECT CASE
    WHEN CONVERT( CAST( CONVERT( statename USING latin1 ) AS BINARY ) USING utf8 ) IS NULL
        THEN statename
    ELSE CONVERT( CAST( CONVERT( statename USING latin1 ) AS BINARY ) USING utf8 )
END
FROM 

returned the same result, though I am not even sure i implemented it correctly in this select.

I am not permitted to normalize this data in this case, so I would like to select it and get

Bocas del Toro
Chiriquí
Coclé
Colón
Darién
Veraguas
Panamá Oeste
Emberá
Kuna Yala
Ngöbe-Buglé

Will this be possible?

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53

1 Answers1

0

This seems to be an issue with the SQL_MODE. In order the conversion to fail and return NULL - STRICT_TRANS_TABLES mode must be set. You can set it with

SET SESSION sql_mode = CONCAT('STRICT_TRANS_TABLES,', @@sql_mode);

If you don't want to break other "working" queries in the same session, you should reset it after you've got the resutls:

SET @old_sql_mode = @@sql_mode;
SET SESSION sql_mode = CONCAT('STRICT_TRANS_TABLES,', @@sql_mode);

SELECT COALESCE(
  CONVERT( CAST( CONVERT( statename USING latin1 ) AS BINARY ) USING utf8 ), statename
) as statename
FROM yourTable;

SET SESSION sql_mode = @old_sql_mode;

DB Fiddle demo

Note: I have changed your query a bit to use COALESCE() instead of the CASE statement, so you don't need to duplicate the conversion code.

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53