1

Is it possible to search by Unicode range in MySQL? I have used this before, and I thought it worked:

SELECT * FROM people where surname NOT REGEXP "[\u0020\u0027\u002D\u0041-\uFF5A]"

The RegEx will find any surnames not composed of Latin characters, spaces, apostrohpes and hyphens, but it is not working on a table I am testing now.

From what I am reading, it seems this is not possible. If so, are there any workarounds? Such as specifying all the characters manually, e.g.

SELECT * FROM people where surname NOT REGEXP "[ -'abcdefg...]"

UPDATE: The above is an acceptable solution.

Kohjah Breese
  • 4,008
  • 6
  • 32
  • 48
  • 1
    Why can't you just look for normal (non-inverted) regexp, like `[a-zA-Z '-]`? – raina77ow Sep 08 '14 at 15:34
  • Because that RegEx does not look for all the characters in the Latin Unicode tables. – Kohjah Breese Sep 08 '14 at 15:35
  • 1
    I have no experience with MySQL, but try `"[\\u0020\\u0027\\u002D\\u0041-\\uFF5A]"`. The backslash character is the escape character in MySQL strings according to [MySQL String Literals](http://dev.mysql.com/doc/refman/4.1/en/string-literals.html) documentation and must be therefore escaped with one more backslash. – Mofi Sep 09 '14 at 06:13
  • Thanks for the suggestion. It does not work though. It seems this is not supported by MySQL. – Kohjah Breese Sep 09 '14 at 12:32

1 Answers1

4

An old-yet-important question that was answered by @Mofi's comment:

On MySql 8.0 and above, the Unicode prefix \u should be escaped: \\u.

So, for example, to search for the range a-z:

SELECT * FROM people where surname REGEXP "[\\u0061-\\u007a]"

On earlier versions, using Unicode is unsupported (see this question), and what actually happens is a set match, e.g. for [\\u0061], strings containing u, 0, 6 or 1 are matched.

You can try it out with a fiddle that has both new & old version, like this one, and see for yourself.

OfirD
  • 9,442
  • 5
  • 47
  • 90