5

I'm looking for an SQL statement that will return only rows of my table whose Name field contains special characters (excluding underscores).

I've tried:

SELECT * FROM 'table' WHERE Name REGEXP '^[!#$%&()*+,\-./:;<=>?@[\\\]^`{|}~]+$'

But no dice, this returns an empty result set (despite there being rows I specifically added with Name fields containing %, $, and # characters).

Damon
  • 67,688
  • 20
  • 135
  • 185
saricden
  • 2,084
  • 4
  • 29
  • 40
  • See examples here: http://stackoverflow.com/questions/9604727/oracle-sql-regexp-like-contains-characters-other-than-a-z-or-a-z – Art Jan 30 '13 at 14:13

2 Answers2

4

The first problem seems to be is the ^ and $ signs (Mike C summarized it quicker than I did why...)

But I see escaping problems too: all special characters that mean something in regexp should be escaped specially placed in the [], so [, ], ^, -

Here is a question about how to escape special characters inside character groups in MySQL regexes.

Conclusion detailed in the regex documentation:

A bracket expression is a list of characters enclosed in '[]'. It normally matches any single character from the list (but see below).

  • If the list begins with '^', it matches any single character (but see below) not from the rest of the list.

  • If two characters in the list are separated by '-', this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g. '[0-9]' in ASCII matches any decimal digit.

  • It is illegal(!) for two ranges to share an endpoint, e.g. 'a-c-e'. Ranges are very collating sequence-dependent, and portable programs should avoid relying on them.

  • To include a literal ']' in the list, make it the first character (following a possible '^').

  • To include a literal '-', make it the first orlast character, or the second endpoint of a range.

  • To use a literal '-' as the first endpoint of a range, enclose it in '[.' and '.]' to make it a collating element (see below).

With the exception of these and some combinations using '[' (see next paragraphs), all other special characters, including '\', lose their special significance within a bracket expression.

EDIT Here is an SQL fiddle about some interesting regexes regarding the ] character

DDL: create table txt ( txt varchar(200) );

insert into txt values ('ab[]cde');
insert into txt values ('ab[cde');
insert into txt values ('ab]cde');
insert into txt values ('ab[]]]]cde');
insert into txt values ('ab[[[[]cde');
insert into txt values ('ab\\]]]]cde');
insert into txt values ('ab[wut?wut?]cde');

Queries:

Naive approach to match a group of [ and ] chars. Syntactically OK, but the group is the single [ char, and it matches multiple ] chars afterwards.

SELECT * FROM txt WHERE txt 
REGEXP 'ab[[]]+cde';

Escaped -> same ???

SELECT * FROM txt WHERE txt 
REGEXP 'ab[[\]]+cde';

Double escape -> doesn't work, group is now a [ and a \

SELECT * FROM txt WHERE txt 
REGEXP 'ab[[\\]]+cde';

Swapping the closing bracket with the opening one inside the group. This is the weirdest regex I ever wrote - to this point...

SELECT * FROM txt WHERE txt 
REGEXP 'ab[][]+cde';

I will get killed by such a (totally valid!) regex in a weird nightmare, I think:

SELECT * FROM txt WHERE txt 
REGEXP 'ab[]wut?[]+cde';
Community
  • 1
  • 1
ppeterka
  • 20,583
  • 6
  • 63
  • 78
  • Thanks ppeterka, and +1 for raising the issue of the required escapes – Mike C Jan 30 '13 at 13:19
  • @MikeC I was partially wrong about it, tried to write a SQL fiddle, and found interesting results in this topic, which I added to my answer... Very strange, I was convinced they can be reasonably escaped, but I couldn't get `]` to behave... Here is the fiddle for it: http://sqlfiddle.com/#!2/bfc4c/3 – ppeterka Jan 30 '13 at 14:21
3

This regex should match names that ONLY contain special characters. You specify the carat (^) which signifies the start of the string, your character class with your list of special characters, the plus sign (+) to indicate one or more, and then the dollar to signify the end of the string. You need to account for non-special character in the string. You could try something like this:

WHERE Name REGEXP '^.*?[!#$%&()*+,\-./:;<=>?@[\\\]^`{|}~]+.*?$'

I added the .*? at the beginning and end to allow for non-special characters before and after the special character. BTW, you probably don't need the (+) any more, since one special would be enough for a match.

Mike C
  • 3,077
  • 22
  • 29