2

Strangely I can't seem to find anywhere a list of the characters that I can't safely use as literals within MySQL regular expression square brackets without escaping them or requiring the use of a [:character_class:] thing.

(Also the answer probably needs to be MySQL specific because MySQL regular expressions seem to be lacking compared those in Perl/PHP/Javascript etc).

spiderplant0
  • 3,872
  • 12
  • 52
  • 91

2 Answers2

4

Almost all metacharacters (including the dot ., the +, * and ? quantifiers, the end-of-string anchor $, etc.) have no special meaning in character classes, with a few notable exceptions:

  • closing bracket ], for obvious reasons
  • caret ^, which is used to negate the character class (eg: [^ab] matches any character but a and b).
  • hyphen -, which is used to denote a range (eg: [0-9] matches any digit)

However, these can still be added without escaping if placed in strategic locations within the character class:

  • the closing bracket can be placed right after the opening bracket, eg: []a] matches ] or a.
  • the caret can be placed anywhere but after the opening bracket, eg: [a^] matches ^ or a
  • the hyphen can be placed right after the opening bracket or before the closing bracket, eg: [-a] and [a-] both match a and -.

More information can be found in the man page on POSIX regex (thanks Tomalak Geret'kal!)

NullUserException
  • 83,810
  • 28
  • 209
  • 234
0

From the documentation, right near the top:

This section summarizes, with examples, the special characters and constructs that can be used in MySQL for REGEXP operations. It does not contain all the details that can be found in Henry Spencer's regex(7) manual page. That manual page is included in MySQL source distributions, in the regex.7 file under the regex directory.

Said manpage can be found copied here (thanks, Google!). The information you're looking for is available in there.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • Thanks, but thats for the special characters for regular expressions - not the ones for use within square brackets which I believe is a different set. – spiderplant0 Nov 15 '11 at 18:41
  • 1
    @spiderplant0: Huh? `[..]` is part of regular expression syntax, and there are several paragraphs talking about the rules for it. There's no simple list, because the semantics are more complicated than that. You should read the man page I linked you to again, and spend longer than 3 minutes on it this time please. – Lightness Races in Orbit Nov 15 '11 at 18:42
  • 1
    @NullUserExceptionఠ_ఠ: The man page that it links to is _the_ authoritative source of _all_ information pertaining to the topic. It _completely_ answers the question. – Lightness Races in Orbit Nov 15 '11 at 18:46
  • In the standard MYSQl documentation, there are several paragraphs talking about the rules which I read before asking this question. But as I'm not an expert I was confused and didnt appreciate that *, ? etc did not need to be escaped when used in square brackets even though they did in non-square brackets regular expression. So I was asking for a definitive list to clear things up. Hence I accepted NullUserException answer. – spiderplant0 Nov 15 '11 at 18:50
  • 1
    @spiderplant0: I already explained why there is not a "list of characters", and all NullUserException has done is to re-word the authoritative text that you couldn't find. What a waste of time. >. – Lightness Races in Orbit Nov 15 '11 at 18:51
  • 1
    @TomalakGeret'kal +1 for the docs, which made me realize the backslash does *not* have special meaning in POSIX character classes, unlike PCRE. – NullUserException Nov 15 '11 at 18:53
  • I would like to point out however that just a link to an external source isn't really a proper answer on SO. See: http://meta.stackexchange.com/questions/8231/are-answers-that-just-contain-links-elsewhere-really-good-answers – NullUserException Nov 15 '11 at 18:58
  • @NullUserExceptionఠ_ఠ: I usually flag to delete any answers that are just links rather than explanations. However I felt that this was a slightly different case; the core issue here really is that the OP was unable to locate this information in the documentation (see first sentence), and the _most_ benefit to be gained is in helping him/her to better interpret the manual. It's really much more than just a link, though I appreciate that the difference may appear subtle. – Lightness Races in Orbit Nov 15 '11 at 19:29
  • Tomalak Geret'kal, I appreciate your time and effort, and manuals are my first port of call. But sometimes, something in the manual may not be clear to the reader. Even if it seems obvious to someone else. Hence the benefit of SO for clearing things up. So a direct answer to the question is usually best - which is why I chose NullUserExceptionఠ_ఠ's answer. In this case (as it usually is) it was not just about plugging in the solution but to make the subject clearer to the questioner. – spiderplant0 Nov 15 '11 at 20:31
  • @spiderplant0: Please use `@notification` syntax like everybody else; I found your comment only by chance. – Lightness Races in Orbit Nov 16 '11 at 12:49