Regex - how to match everything except a particular pattern

Question

How do I write a regex to match any string that doesn't meet a particular pattern? I'm faced with a situation where I have to match an (A and ~B) pattern.

PCRE would be best for this: see [Regex Pattern to Match, Excluding when… / Except between](https://stackoverflow.com/questions/23589174). I removed `findstr` tag since all answers here are not valid for the tag. — Wiktor Stribiżew, Mar 04 '20 at 09:22

score 198 · Accepted Answer · edited Jul 20 '16 at 00:08

198

You could use a look-ahead assertion:

(?!999)\d{3}

This example matches three digits other than 999.

But if you happen not to have a regular expression implementation with this feature (see Comparison of Regular Expression Flavors), you probably have to build a regular expression with the basic features on your own.

A compatible regular expression with basic syntax only would be:

[0-8]\d\d|\d[0-8]\d|\d\d[0-8]

This does also match any three digits sequence that is not 999.

edited Jul 20 '16 at 00:08

Pro Backup

729
14
34

answered Mar 04 '09 at 18:41

Gumbo

643,351
109
780
844

1

Look-ahead is not standard regular expression syntax, it is a Perl extension, it will only work in Perl, PCRE (Perl-Compatible RegEx) or other non-standard implementations – Juliano Mar 04 '09 at 19:26
10

It may not be standard, but don't most modern languages support it? What language *doesn't* support look-aheads these days? – Bryan Oakley Mar 04 '09 at 19:45
1

That’s true. But most regex flavors support this feature (see ). – Gumbo Mar 04 '09 at 19:49
Turns out that the windows findstr function only supports pure DFA-style regex anyway, so I need to just do it all differently. You still get the answer, though. – notnot Mar 04 '09 at 21:38
1

i think the last regex would also not match 009, 019... etc – Sebastian Viereck Sep 26 '13 at 10:03
1

Standard Lex for C does not use PCREs :-( – pieman72 Feb 02 '15 at 21:54

Aleks · Answer 2 · 2013-02-21T14:00:24.453

32

If you want to match a word A in a string and not to match a word B. For example: If you have a text:

1. I have a two pets - dog and a cat
2. I have a pet - dog

If you want to search for lines of text that HAVE a dog for a pet and DOESN'T have cat you can use this regular expression:

^(?=.*?\bdog\b)((?!cat).)*$

It will find only second line:

2. I have a pet - dog

edited Feb 21 '13 at 14:00

answered Feb 21 '13 at 11:26

Aleks

4,866
3
38
69

He failed mention it in the question, but the OP is actually using the DOS `findstr` command. It affords only a tiny subset of the capabilities you expect to find in a regex tool; lookahead is not among them. (I just added the [tag:findstr] tag myself.) – Alan Moore Feb 21 '13 at 13:42
2

hm, yes, I found now in one of his comments on the posts. I saw Regex in the title. Anyways, if somebody finds this post when searching for the same for regular expression, like I did, maybe it could be helpful to someone :) thanks for comments – Aleks Feb 21 '13 at 13:59

score 15 · Answer 3 · answered Mar 04 '09 at 18:48

15

Match against the pattern and use the host language to invert the boolean result of the match. This will be much more legible and maintainable.

answered Mar 04 '09 at 18:48

Ben S

68,394
30
171
212

1

Then I just end up with (~A or B) instead of (A and ~B). It doesn't solve my problem. – notnot Mar 04 '09 at 21:06
1

Pseudo-code: String toTest; if (toTest.matches(A) AND !toTest.matches(B)) { ... } – Ben S Mar 04 '09 at 21:54
I should have been more clear - the pieces are not fully independent. If A matches part of the string, then we care if ~B matches the rest of it (but not necessarily the whole thing). This was for the windows command-line findstr function, which i found is restricted to true regexs, so moot point. – notnot Mar 04 '09 at 22:07

score 8 · Answer 4 · edited May 23 '17 at 12:18

8

notnot, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

I'm faced with a situation where I have to match an (A and ~B) pattern.

The basic regex for this is frighteningly simple: B|(A)

You just ignore the overall matches and examine the Group 1 captures, which will contain A.

An example (with all the disclaimers about parsing html in regex): A is digits, B is digits within <a tag

The regex: <a.*?<\/a>|(\d+)

Demo (look at Group 1 in the lower right pane)

Reference

How to match pattern except in situations s1, s2, s3

How to match a pattern unless...

edited May 23 '17 at 12:18

Community

1
1

answered May 13 '14 at 21:51

zx81

41,100
9
89
105

This sounds too good to be true! Unfortunately, this solution is not universal and it fails in Emacs, even after replacing `\d` with `[[:digit:]]`. [The first reference](https://stackoverflow.com/questions/23589174/regex-pattern-to-match-excluding-when-except-between) mentions it is specific to Perl and PHP: "There is a variation using syntax specific to Perl and PHP that accomplishes the same." – miguelmorin Oct 24 '18 at 12:43

score 4 · Answer 5 · edited Aug 21 '12 at 10:36

4

The complement of a regular language is also a regular language, but to construct it you have to build the DFA for the regular language, and make any valid state change into an error. See this for an example. What the page doesn't say is that it converted /(ac|bd)/ into /(a[^c]?|b[^d]?|[^ab])/. The conversion from a DFA back to a regular expression is not trivial. It is easier if you can use the regular expression unchanged and change the semantics in code, like suggested before.

edited Aug 21 '12 at 10:36

Radu

2,076
2
20
40

answered Mar 04 '09 at 19:11

Juliano

39,173
13
67
73

2

If I were dealing with actual regex's then this would all be moot. Regex now seems to refer to the nebulous CSG-ish (?) space of pattern matching that most langauges support. Since I need to match (A and ~B), there's no way to remove the negation and still do it all in one step. – notnot Mar 04 '09 at 21:48
Lookahead, as described above, would have done it if findstr did anything beyond true DFA regexs. The whole thing is sort of odd and I don't know why I have to do this command-line (batch now) style. It's just another example of my hands being tied. – notnot Mar 04 '09 at 21:53
1

@notnot: You are using findstr from Windows? Then you just need /v. Like: findstr A inputfile | findstr /v B > outputfile.txt The first matches all lines with A, the second matches all lines that doesn't have B. – Juliano Mar 04 '09 at 22:04
Thanks! That's actually exactly what I needed. I didn't ask the question that way, though, so I still giving the answer to Gumbo for the more generalized answer. – notnot Mar 05 '09 at 17:16

score 2 · Answer 6 · answered Mar 05 '09 at 02:26

2

pattern - re

str.split(/re/g)

will return everything except the pattern.

Test here

answered Mar 05 '09 at 02:26

unigogo

537
4
9

You probably want to mention that you need to join then again. – tomdemuyt Mar 26 '12 at 14:07
A similar approach is using `replace` `str.replace(/re/g, '')`, then there's no need to rejoin them. also if you throw in a nice trailing \s? like `str.replace(/\re\s?/g, '')` then you get rid of any duplicate spaces you would have from something being replaced in the middle of a string – jakecraige Jan 22 '14 at 06:28

score 0 · Answer 7 · edited May 23 '17 at 11:47

My answer here might solve your problem as well:

https://stackoverflow.com/a/27967674/543814

Instead of Replace, you would use Match.
Instead of group $1, you would read group $2.
Group $2 was made non-capturing there, which you would avoid.

Example:

Regex.Match("50% of 50% is 25%", "(\d+\%)|(.+?)");

The first capturing group specifies the pattern that you wish to avoid. The last capturing group captures everything else. Simply read out that group, $2.

score 0 · Answer 8 · edited Jul 22 '17 at 20:41

0

(B)|(A)

then use what group 2 captures...

edited Jul 22 '17 at 20:41

bobble bubble

16,888
3
27
46

answered Mar 05 '09 at 02:29

DW.

71
2
9

He needs to *capture* not B, he aim is not to just ignore all the B patterns. – hexicle Jul 16 '13 at 06:30

Regex - how to match everything except a particular pattern

8 Answers8

Linked