4

I tried running this query:

SELECT column FROM table WHERE column REGEXP '[^A-Za-z\-\']'

but this returns

#1139 - Got error 'invalid character range' from regexp

which seems to me like the - in the character class is not being escaped, and instead read as an invalid range. Is there some other way that it's suppose to be escaped for mysql to be the literal -?

This regex works as expected outside of mysql, https://regex101.com/r/wE8vY5/1.

I came up with an alternative to that regex which is

SELECT column FROM table WHERE column NOT REGEXP '([:alpha:]|-|\')'

so the question isn't how do I get this to work. The question is why doesn't the first regex work?

Here's a SQL fiddle of the issue, http://sqlfiddle.com/#!9/f8a006/1.

Also, there is no language being used here, query is being run at DB level.

Regex in PHP: http://sandbox.onlinephpfunctions.com/code/10f5fe2939bdbbbebcc986c171a97c0d63d06e55

Regex in JS: https://jsfiddle.net/6ay4zmrb/

chris85
  • 23,846
  • 7
  • 34
  • 51

2 Answers2

10

Just change the order.

SELECT column FROM table WHERE column REGEXP '[^-A-Za-z\']'
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Yup, changing the order works. Is the escaping not supported though? – chris85 Jul 31 '15 at 16:41
  • What tool did you use? If that string is in PHP, then you need 2 levels of escaping, one for PHP, a second one for SQL. – Rick James Jul 31 '15 at 22:08
  • @RickJames I'm running queries in phpmyadmin directly, no intermediary. I've also tried the double escaping, it also didn't work. http://sqlfiddle.com/#!9/f8a006/1 – chris85 Aug 01 '15 at 11:29
  • It is interesting that changing the order works and that the `-` doesn't need to be escaped at all. – chris85 Aug 01 '15 at 11:35
  • The `-` is a special case since it is used in the syntax for a 'range'. By putting it first, it is clearly not syntax for a range, so it can be treated as an ordinary character. Similarly, some of these: `^$+*` change there meaning depending on context. – Rick James Aug 01 '15 at 14:52
  • If the `'` is giving you trouble, switch to `REGEXP "[^-A-Za-z']"` so you can avoid the backslash. – Rick James Aug 01 '15 at 14:53
  • @RickJames Is there no escaping in the mysql regexp? Why doesn't the `\` make the `-` not a range? – chris85 Aug 01 '15 at 18:19
  • Backslash is the escape character for both PHP and MySQL (and some other languages). My last suggestion could avoid all escaping, just in case there was a problem with it. – Rick James Aug 01 '15 at 23:38
  • `^` is a special character immediately after `[`. It means "invert" the set of characters that follows. – Rick James Aug 01 '15 at 23:39
  • Working from the inside out: 1. "I want the set that includes `-`, upper case letters, lower case letters and `'`." 2. "No, actually I want all characters _except_ those." (due to the `^`) 3. (since there is no 'anchor' (`^` or `$`)), look for such a character anywhere in the string. Note: `^` has two _different_ meanings depending on context. (Actually 3.) – Rick James Aug 01 '15 at 23:43
  • @RickJames http://www.regular-expressions.info/charclass.html `The order of the characters inside a character class does not matter` is that not true in mysql's regex operator? Also, I still don't see why the escaping doesn't work and haven't found and doc that explains it. – chris85 Aug 03 '15 at 18:42
  • `-` must be first (or escaped). `a-z` is a unit. (Yes, I am avoiding escaping, rather than fixing it.) – Rick James Aug 03 '15 at 18:52
  • In order to fix the escaping, we need to see the code. What language are you writing in? Which API (PHP has 3)? Which quotes are you using (Perl has at least 3 ways of quoting)? Etc. I _suspect_ that 2 or 4 backslashes would work, but it depends. – Rick James Aug 03 '15 at 18:54
  • @RickJames This is in mysql directly, no language between. See the SQL fiddle link. One \ seems to act the same as two and four. Here it is with the 4 slashes, http://sqlfiddle.com/#!9/f8a006/6 Why must the `-` be first? This works fine in PHP, http://sandbox.onlinephpfunctions.com/code/10f5fe2939bdbbbebcc986c171a97c0d63d06e55, and JS, https://jsfiddle.net/6ay4zmrb/. – chris85 Aug 03 '15 at 19:32
  • To experiment, get rid of the ^ and the ranges. Instead, focus just on \\- and/or \\'. There are too many things going on to see what is what. – Rick James Aug 03 '15 at 20:49
  • @RickJames I figured it out, the POSIX doesn't do character classes which causes the issue. I posted an answer below on the issue. The only solution is to have the `-` at the start or end; as this answer suggested. Thanks for your help. – chris85 Aug 03 '15 at 23:04
3

@Avinash Raj is correct the - must be first (or last). The \ is not an escape character in POSIX, which is what mysql uses, https://dev.mysql.com/doc/refman/5.1/en/regexp.html.

One key syntactic difference is that the backslash is NOT a metacharacter in a POSIX bracket expression.

-http://www.regular-expressions.info/posixbrackets.html

What special characters must be escaped in regular expressions?

Inside character classes, the backslash is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally

Community
  • 1
  • 1
chris85
  • 23,846
  • 7
  • 34
  • 51