First, the "CHARACTER SET" (such as utf8mb4
) is the encoding. You can search for such in the data.
On top of that, the "COLLATION" (such as utf8mb4_bin
) is how the characters are compared. For that, you need to look in information_schema.COLUMNS
.
To add to the confusion, ±
looks like "Mojibake" for ±
. That is where the encoding got messed up when inserting data. Not all "Mojibake" start with Â
. This gives a list of Mojibake messes from "latin1" characters: http://mysql.rjweb.org/doc.php/charcoll#8_bit_encodings. In the middle of the second table is ±
.
±
is a rather unlikely case to home in on. Can you find another clue?
Here is a regexp to find rows with any 8-bit chars in the column colname
:
SELECT * FROM tbl WHERE HEX(colname) RLIKE '^(..)*[89A-F].';
If the table is huge, it will take a long time. Also you might want to tack a LIMIT
on the end. (That was found in the same document.)
As for analyzing the cause for Mojibake, see Trouble with UTF-8 characters; what I see is not what I stored And for fixing: http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
Be sure to pick the case that fits your situation.
But, but, ... If Mojibake is involved, the data may be corrupted. Please provide SHOW CREATE TABLE table
, SHOW VARIABLES LIKE 'char%';
I may need some more things to help you figure out what went wrong and how to fix it. (Unless my links suffice.)
An example of using that regexp:
mysql> SELECT city, country FROM `c2014`
WHERE HEX(city) RLIKE '^(..)*[89A-F].'
and id%1237 = 1 and length(city) < 20
LIMIT 7;
+------------------+---------+
| city | country |
+------------------+---------+
| Çernoleva | al |
| Kashta e Bardhë | al |
| Sharrëdushk | al |
| São Tomé | ao |
| Teca diá Ndala | ao |
| Gömür | az |
| Balèmyouré | bf |
+------------------+---------+
Another note: The output would be the same for either utf8mb4, utf8, or latin1. It actually looks for "non-Ascii" characters.