175

How do I write a regex to match any string that doesn't meet a particular pattern? I'm faced with a situation where I have to match an (A and ~B) pattern.

Aleks
  • 4,866
  • 3
  • 38
  • 69
notnot
  • 4,472
  • 12
  • 46
  • 57
  • PCRE would be best for this: see [Regex Pattern to Match, Excluding when… / Except between](https://stackoverflow.com/questions/23589174). I removed `findstr` tag since all answers here are not valid for the tag. – Wiktor Stribiżew Mar 04 '20 at 09:22

8 Answers8

198

You could use a look-ahead assertion:

(?!999)\d{3}

This example matches three digits other than 999.


But if you happen not to have a regular expression implementation with this feature (see Comparison of Regular Expression Flavors), you probably have to build a regular expression with the basic features on your own.

A compatible regular expression with basic syntax only would be:

[0-8]\d\d|\d[0-8]\d|\d\d[0-8]

This does also match any three digits sequence that is not 999.

Pro Backup
  • 729
  • 14
  • 34
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 1
    Look-ahead is not standard regular expression syntax, it is a Perl extension, it will only work in Perl, PCRE (Perl-Compatible RegEx) or other non-standard implementations – Juliano Mar 04 '09 at 19:26
  • 10
    It may not be standard, but don't most modern languages support it? What language *doesn't* support look-aheads these days? – Bryan Oakley Mar 04 '09 at 19:45
  • 1
    That’s true. But most regex flavors support this feature (see ). – Gumbo Mar 04 '09 at 19:49
  • Turns out that the windows findstr function only supports pure DFA-style regex anyway, so I need to just do it all differently. You still get the answer, though. – notnot Mar 04 '09 at 21:38
  • 1
    i think the last regex would also not match 009, 019... etc – Sebastian Viereck Sep 26 '13 at 10:03
  • 1
    Standard Lex for C does not use PCREs :-( – pieman72 Feb 02 '15 at 21:54
32

If you want to match a word A in a string and not to match a word B. For example: If you have a text:

1. I have a two pets - dog and a cat
2. I have a pet - dog

If you want to search for lines of text that HAVE a dog for a pet and DOESN'T have cat you can use this regular expression:

^(?=.*?\bdog\b)((?!cat).)*$

It will find only second line:

2. I have a pet - dog
Aleks
  • 4,866
  • 3
  • 38
  • 69
  • He failed mention it in the question, but the OP is actually using the DOS `findstr` command. It affords only a tiny subset of the capabilities you expect to find in a regex tool; lookahead is not among them. (I just added the [tag:findstr] tag myself.) – Alan Moore Feb 21 '13 at 13:42
  • 2
    hm, yes, I found now in one of his comments on the posts. I saw Regex in the title. Anyways, if somebody finds this post when searching for the same for regular expression, like I did, maybe it could be helpful to someone :) thanks for comments – Aleks Feb 21 '13 at 13:59
15

Match against the pattern and use the host language to invert the boolean result of the match. This will be much more legible and maintainable.

Ben S
  • 68,394
  • 30
  • 171
  • 212
  • 1
    Then I just end up with (~A or B) instead of (A and ~B). It doesn't solve my problem. – notnot Mar 04 '09 at 21:06
  • 1
    Pseudo-code: String toTest; if (toTest.matches(A) AND !toTest.matches(B)) { ... } – Ben S Mar 04 '09 at 21:54
  • I should have been more clear - the pieces are not fully independent. If A matches part of the string, then we care if ~B matches the rest of it (but not necessarily the whole thing). This was for the windows command-line findstr function, which i found is restricted to true regexs, so moot point. – notnot Mar 04 '09 at 22:07
8

notnot, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

I'm faced with a situation where I have to match an (A and ~B) pattern.

The basic regex for this is frighteningly simple: B|(A)

You just ignore the overall matches and examine the Group 1 captures, which will contain A.

An example (with all the disclaimers about parsing html in regex): A is digits, B is digits within <a tag

The regex: <a.*?<\/a>|(\d+)

Demo (look at Group 1 in the lower right pane)

Reference

How to match pattern except in situations s1, s2, s3

How to match a pattern unless...

Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105
  • This sounds too good to be true! Unfortunately, this solution is not universal and it fails in Emacs, even after replacing `\d` with `[[:digit:]]`. [The first reference](https://stackoverflow.com/questions/23589174/regex-pattern-to-match-excluding-when-except-between) mentions it is specific to Perl and PHP: "There is a variation using syntax specific to Perl and PHP that accomplishes the same." – miguelmorin Oct 24 '18 at 12:43
4

The complement of a regular language is also a regular language, but to construct it you have to build the DFA for the regular language, and make any valid state change into an error. See this for an example. What the page doesn't say is that it converted /(ac|bd)/ into /(a[^c]?|b[^d]?|[^ab])/. The conversion from a DFA back to a regular expression is not trivial. It is easier if you can use the regular expression unchanged and change the semantics in code, like suggested before.

Radu
  • 2,076
  • 2
  • 20
  • 40
Juliano
  • 39,173
  • 13
  • 67
  • 73
  • 2
    If I were dealing with actual regex's then this would all be moot. Regex now seems to refer to the nebulous CSG-ish (?) space of pattern matching that most langauges support. Since I need to match (A and ~B), there's no way to remove the negation and still do it all in one step. – notnot Mar 04 '09 at 21:48
  • Lookahead, as described above, would have done it if findstr did anything beyond true DFA regexs. The whole thing is sort of odd and I don't know why I have to do this command-line (batch now) style. It's just another example of my hands being tied. – notnot Mar 04 '09 at 21:53
  • 1
    @notnot: You are using findstr from Windows? Then you just need /v. Like: findstr A inputfile | findstr /v B > outputfile.txt The first matches all lines with A, the second matches all lines that doesn't have B. – Juliano Mar 04 '09 at 22:04
  • Thanks! That's actually exactly what I needed. I didn't ask the question that way, though, so I still giving the answer to Gumbo for the more generalized answer. – notnot Mar 05 '09 at 17:16
2

pattern - re

str.split(/re/g) 

will return everything except the pattern.

Test here

unigogo
  • 537
  • 4
  • 9
  • You probably want to mention that you need to join then again. – tomdemuyt Mar 26 '12 at 14:07
  • A similar approach is using `replace` `str.replace(/re/g, '')`, then there's no need to rejoin them. also if you throw in a nice trailing \s? like `str.replace(/\re\s?/g, '')` then you get rid of any duplicate spaces you would have from something being replaced in the middle of a string – jakecraige Jan 22 '14 at 06:28
0

My answer here might solve your problem as well:

https://stackoverflow.com/a/27967674/543814

  • Instead of Replace, you would use Match.
  • Instead of group $1, you would read group $2.
  • Group $2 was made non-capturing there, which you would avoid.

Example:

Regex.Match("50% of 50% is 25%", "(\d+\%)|(.+?)");

The first capturing group specifies the pattern that you wish to avoid. The last capturing group captures everything else. Simply read out that group, $2.

Community
  • 1
  • 1
Timo
  • 7,992
  • 4
  • 49
  • 67
0
(B)|(A)

then use what group 2 captures...

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
DW.
  • 71
  • 2
  • 9