I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCII characters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to find these records using MySQL?
-
8Ollie Jones has a much better answer (check the bottom). – Jonathan Arkell Nov 12 '12 at 17:45
-
1@JonathanArkell Not on the bottom anymore :) – Brilliand May 22 '14 at 22:02
-
Correction.. check the middle! ;) – Jonathan Arkell May 23 '14 at 15:26
-
This is the answer @Jonathan is talking about https://stackoverflow.com/a/11741314/792066 – Braiam Oct 08 '18 at 19:53
10 Answers
MySQL provides comprehensive character set management that can help with this kind of problem.
SELECT whatever
FROM tableName
WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)
The CONVERT(col USING charset)
function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.
See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-repertoire.html
You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)

- 103,626
- 17
- 118
- 172
-
24This is an excellent solution to this problem and much more robust. – CraigDouglas Oct 23 '12 at 14:33
-
6this is also useful to find characters with accents (á ä etc) or character not belonging to encoding – Glasnhost Oct 22 '14 at 12:34
-
3much better than using REGEXP (which doesn't seem to work for me for finding accents) and also provides a simple mechanism for making everything ascii again... – D2TheC Nov 14 '17 at 08:47
-
1This answer works wonderfully and will bring up strings that contain any non-ASCII characters rather than just strings that contain *only* non-ASCII characters. Thank you! – techjp Sep 11 '18 at 07:26
-
3
You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query
SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';
This was the most comprehensive query I could come up with.
-
4Best answer so far, but it's even easier like this : `SELECT * FROM table WHERE LENGTH( column ) != CHAR_LENGTH( column )` – SuN Nov 28 '12 at 18:53
-
16-1 **This can yield erroneous results.** Suppose, for example, that one has a UTF-16 column containing `'ā'` (encoded by the byte sequence `0x0101`) - it would be deemed "ASCII" using this test: *a false negative*; indeed, some character sets do not encode ASCII characters within `0x00` to `0x7f` whereupon this solution would yield a false positive. **DO NOT RELY UPON THIS ANSWER!** – eggyal Feb 11 '14 at 09:13
-
2@sun: That doesn't help at all - many character sets are fixed-length and so `LENGTH(column)` will be a constant multiple of `CHAR_LENGTH(column)` irrespective of the value. – eggyal Feb 11 '14 at 09:39
It depends exactly what you're defining as "ASCII", but I would suggest trying a variant of a query like this:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9]';
That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are OK, change the query to:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9.,-]';
The most relevant page of the MySQL documentation is probably 12.5.2 Regular Expressions.

- 1
- 11
- 47
- 78

- 73,098
- 23
- 151
- 149
-
Thanks - I will take a look at that. I don't have much experience with Regular Expressions in SQL, so this will be a good opportunity to learn. – Ed Mays Dec 31 '08 at 03:47
-
3Shouldn't you escape the hyphen and period? (Since they do have special meanings in a regular expression.) SELECT * FROM tableName WHERE NOT columnToCheck REGEXP '[A-Za-z0-9\.,\-]'; – Tooony Mar 12 '09 at 16:03
-
3@Tooony No, inside of a set, a period just means itself and the dash only has special meaning between other characters. At the end of the set, it means only itself. – Michael Speer Jan 25 '12 at 21:55
-
12This query only finds all lines in tableName that do not contain an alphanumeric character. This does not answer the question. – Rob Bailey Aug 26 '13 at 17:26
-
Note the **warning** in the [documentation](https://dev.mysql.com/doc/en/regexp.html): "*The `REGEXP` and `RLIKE` operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.*" – eggyal Feb 11 '14 at 09:35
-
12That is for columns that don't have any ascii characters at all, so it will miss those with a mix of ascii and non-ascii characters. The answer below from zende checks for one or more non-ascii characters. This helped me for the most part `SELECT * FROM tbl WHERE colname NOT REGEXP '^[A-Za-z0-9\.,@&\(\) \-]*$';` – Frank Forte Oct 16 '15 at 21:00
-
Frank above is right. You need the "^" at the beginning of the REGEXP to find any chars NOT in the set. Then eliminate the "NOT" in the query. So (in addition to the better answers below) something like this works: SELECT ... WHERE colname REGEXP '[^A-Za-z0-9.,-]'; – IcarusNM Mar 17 '16 at 16:46
-
Icarus's solution worked for me. In my opinion Chad's answer is not a correct answer, even though it was selected as correct. – Slam Aug 09 '17 at 16:53
-
2This only works (for me anyway) to find strings that contain NONE of those characters. It does not find strings that contain a mix of ASCII and non-ASCII characters. – techjp Sep 11 '18 at 07:24
This is probably what you're looking for:
select * from TABLE where COLUMN regexp '[^ -~]';
It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).

- 30,738
- 21
- 105
- 131
-
7Works great for me. "regexp '[^ -~]'" means has a character that is before space " " or after "~" or ASCII 32 - 126. All letters, numbers, and symbols, but no unprintable things. – Josh Dec 11 '12 at 02:18
-
You can even get it as a tee-shirt ;) http://www.catonmat.net/blog/my-favorite-regex/ – SamGoody Feb 07 '13 at 20:52
-
2Note the **warning** in the [documentation](https://dev.mysql.com/doc/en/regexp.html): "*The `REGEXP` and `RLIKE` operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.*" – eggyal Feb 11 '14 at 09:36
-
1thanks for this. what i'm wondering is how to replace a replacement character - e.g. â�� – mars-o Apr 15 '14 at 14:58
-
space is first printable, ~ last printable, ^ means negate. This regexp looks a bit odd because a space is the beginning of the range – zzapper Jan 27 '15 at 11:52
-
1@mars-o - the black diamond indicates an invalid utf8 character. More discussion [_here_](http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored) – Rick James Apr 08 '17 at 19:57
-
@eggyal This works perfectly fine in utf8 at least, because utf8 preserves ASCII. I wouldn't trust it in utf16. – Brilliand Feb 19 '20 at 22:51
-
One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:
select * from TABLE where COLUMN like '%\0%';

- 30,738
- 21
- 105
- 131

- 1,747
- 15
- 18
Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:
SELECT * FROM `table` WHERE NOT `field` REGEXP "[\\x00-\\xFF]|^$";
It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike @Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)
It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:
SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP "[\\x00-\\xFF]";
It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.
Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!

- 28,357
- 12
- 85
- 125
-
100-FF includes all possible 8-bit values, which is what `REGEXP` is checking. Hence it is guaranteed to always match. Also `^$` is probably not what you wanted. – Rick James Apr 08 '17 at 19:55
-
Definitely the best REGEXP solution for finding all 8 bit characters but not as good as the CONVERT(col USING charset) solution which also will allows control characters while limiting display characters to a specific charset. – techjp Dec 25 '18 at 20:07
Try Using this query for searching special character records
SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9@:. \'\-`,\&]'

- 36
- 4
@zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:
SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''

- 18,989
- 14
- 66
- 106
In Oracle we can use below.
SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;

- 285
- 4
- 19
for this question we can also use this method :
Question from sql zoo:
Find all details of the prize won by PETER GRÜNBERG
Non-ASCII characters
ans: select*from nobel where winner like'P% GR%_%berg';

- 5
- 1
- 2