186

Can regular expression be utilized to match any string except a specific string constant (i.e. "ABC")?

Is it possible to exclude just one specific string constant?

Alexander Abakumov
  • 13,617
  • 16
  • 88
  • 129
  • 1
    Which tool are you using? Depending upon the tool, there might be a way to specify this external to your regex. grep supports a -v option to invert the sense of the match, for example. – Will Bickford Sep 08 '09 at 17:23
  • So are you looking to match every character of a given string, except the ABC part of it? In other words, "A string with ABC" would match "A string with ". – Steve Wortham Sep 08 '09 at 19:06

5 Answers5

202

You have to use a negative lookahead assertion.

(?!^ABC$)

You could for example use the following.

(?!^ABC$)(^.*$)

If this does not work in your editor, try this. It is tested to work in ruby and javascript:

^((?!ABC).)*$
John Albietz
  • 65
  • 1
  • 8
Daniel Brückner
  • 59,031
  • 16
  • 99
  • 143
  • 3
    This will work if you're looking for a string that does not include ABC. But is that the goal? Or is the goal to match every character except ABC? – Steve Wortham Sep 08 '09 at 18:37
  • Thanks for pointing that out, you are right - my suggestion only avoids strings starting with ABC - I forgot to anchor the assertion. Going to correct that. – Daniel Brückner Sep 08 '09 at 18:56
  • That's still different than what I was thinking. Perhaps the questioner will clarify what they're looking for. – Steve Wortham Sep 08 '09 at 19:04
  • 5
    I find it quite clear - "any string except a specific string [constant]" hence any string (including strings containing ABC) except ABC itself. – Daniel Brückner Sep 08 '09 at 19:09
  • Yeah, you may be right. If so then you're answer is perfect. You can see my answer to see how I interpreted it. – Steve Wortham Sep 08 '09 at 19:48
  • not working in Javascript. text = 'hi ABC wow'; regex = /(?!^ABC$)/ console.log(text.match(regex)); – Nick Vanderbilt Mar 11 '10 at 17:13
  • You only used the assertion but forgot the matching expression - use (?!^ABC$)(^.*$) and it works. – Daniel Brückner Mar 13 '10 at 10:33
  • 7
    I was helping a friend recently to do something very similar. However, he didn't want to match the string if it contained a string anywhere inside of it. So I wrote a slightly modified version of your expression *(?!.*ABC)^.*$* and this works like a charm. – Steve Wortham Apr 22 '10 at 15:41
9

In .NET you can use grouping to your advantage like this:

http://regexhero.net/tester/?id=65b32601-2326-4ece-912b-6dcefd883f31

You'll notice that:

(ABC)|(.)

Will grab everything except ABC in the 2nd group. Parenthesis surround each group. So (ABC) is group 1 and (.) is group 2.

So you just grab the 2nd group like this in a replace:

$2

Or in .NET look at the Groups collection inside the Regex class for a little more control.

You should be able to do something similar in most other regex implementations as well.

UPDATE: I found a much faster way to do this here: http://regexhero.net/tester/?id=997ce4a2-878c-41f2-9d28-34e0c5080e03

It still uses grouping (I can't find a way that doesn't use grouping). But this method is over 10X faster than the first.

Steve Wortham
  • 21,740
  • 5
  • 68
  • 90
7

This isn't easy, unless your regexp engine has special support for it. The easiest way would be to use a negative-match option, for example:

$var !~ /^foo$/
    or die "too much foo";

If not, you have to do something evil:

$var =~ /^(($)|([^f].*)|(f[^o].*)|(fo[^o].*)|(foo.+))$/
    or die "too much foo";

That one basically says "if it starts with non-f, the rest can be anything; if it starts with f, non-o, the rest can be anything; otherwise, if it starts fo, the next character had better not be another o".

derobert
  • 49,731
  • 15
  • 94
  • 124
  • That won’t allow the empty string, `f`, `fo` and `foo`. – Gumbo Sep 08 '09 at 21:14
  • 1
    @Gumbo: It allows the empty string just fine; notice that ($) is the first alternative, so ^$ (empty string) is accepted. I tested it, and at least in perl 5.0.10 the empty string is accepted. – derobert Sep 08 '09 at 21:50
  • 1
    ... sorry, perl 5.10.0, of course! – derobert Sep 08 '09 at 21:51
7

Try this regular expression:

^(.{0,2}|([^A]..|A[^B].|AB[^C])|.{4,})$

It describes three cases:

  1. less than three arbitrary character
  2. exactly three characters, while either
    • the first is not A, or
    • the first is A but the second is not B, or
    • the first is A, the second B but the third is not C
  3. more than three arbitrary characters
Gumbo
  • 643,351
  • 109
  • 780
  • 844
6

You could use negative lookahead, or something like this:

^([^A]|A([^B]|B([^C]|$)|$)|$).*$

Maybe it could be simplified a bit.

Adam Crume
  • 15,614
  • 8
  • 46
  • 50