1

I'm looking for a regular expression that will match all strings EXCEPT those that contain a certain string within. Can someone help me construct it?

For example, looking for all strings that do not have a, b, and c in them in that order.

So
abasfaf3 would match, whereas
asasdfbasc would not

Ray
  • 2,238
  • 1
  • 20
  • 21
  • Any, looking for a general regular expression that can be adapted to any engine. I was really looking for a theoretical answer. – Ray Nov 26 '08 at 22:43

4 Answers4

4

In Python:

>>> r = re.compile("(?!^.*a.*b.*c.*$)")
>>> r.match("abc")
>>> r.match("xxabcxx")
>>> r.match("ab ")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("abasfaf3")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("asasdfbasc")
>>>
Federico A. Ramponi
  • 46,145
  • 29
  • 109
  • 133
2

in perl:

if($str !~ /a.*?b.*?.*c/g)
{
    print "match";
}

should work.

Alan
  • 45,915
  • 17
  • 113
  • 134
  • I see, so actively looking for strings that match what I'm trying to avoid,, then taking the opposite. I can't think of why that wouldn't work... – Ray Nov 26 '08 at 22:41
  • well, you can theoretical do build a regex that matches the opposite. but that regex would be huge. The way you do that is to build a FSM, invert the end-conditions, then build a regex out of it – Johannes Schaub - litb Nov 26 '08 at 22:43
  • @Ray: Yep you got it. In this case it's easier to regex what you don't want then find the strings that dont match the regex. – Alan Nov 26 '08 at 22:51
  • the s is for substitution, which is incorrect in this case. g is for global match – Alan Nov 26 '08 at 23:37
1

Well, you can theoretical build a regex that matches the opposite. But for longer strings, that regex would become big. The way you would do that systematically is (greatly simplified):

  • Convert the regular expression into a deterministic finite automaton
  • Convert the end conditions of the automaton, so that it accepts the inverted regular language
  • Convert the automaton back to a regular expression by successively removing nodes from the automaton, yet keeping the behavior of it the same. Removing one node will require putting two or more regular expressions together, so that they will account for the removed node.
  • If you happen to have one start node, and one end node, you are finished: The regular expression labeling the edge between them is your searched regular expression.

Practically, you can just match for the string you want not have in it, and invert the result. Here is what it would look like in awk:

echo azyxbc | awk '{ exit ($0 !~ /a.*b.*c/); }' && echo matched

If you are interested into this, i recommend the book "Introduction to the Theory of Computation" by Michael Sipser.

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
0

in Java:

(?m)^a?(.(?!a[^b\r\n]*b[^\r\nc]*c))+$

does match

abasfaf3
xxxabasfaf3

does not match

asasdfbascf
xxxxasasdfbascf
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250