160

I'm processing a file, line-by-line, and I'd like to do an inverse match. For instance, I want to match lines where there is a string of six letters, but only if these six letters are not 'Andrea'. How should I do that?

I'm using RegexBuddy, but still having trouble.

alex
  • 6,818
  • 9
  • 52
  • 103
Andrea Ambu
  • 38,188
  • 14
  • 54
  • 77
  • It actually sounds like you might do better to give us a bit more information about what you're doing, and see if someone can offer an alternative solution. Typically, attempting to parse an entire file by constructing a regular expression that matches each line is a rather complicated route :) – Dan Oct 02 '08 at 20:33

10 Answers10

96
(?!Andrea).{6}

Assuming your regexp engine supports negative lookaheads...

...or maybe you'd prefer to use [A-Za-z]{6} in place of .{6}

Note that lookaheads and lookbehinds are generally not the right way to "inverse" a regular expression match. Regexps aren't really set up for doing negative matching; they leave that to whatever language you are using them with.

Dan
  • 61,568
  • 9
  • 61
  • 78
  • 1
    You need to add the ^ that @Vinko Vrsalovic uses so that it won't match on "ndrea\n" – bdukes Oct 02 '08 at 20:34
  • 2
    . doesn't match \n by default (some languages [eg Perl] allow you to switch on that behaviour, but by default . matches everything BUT \n). – Dan Oct 02 '08 at 20:36
  • 1
    (plus, the OP never mentioned the string had to occur at the start of the line) – Dan Oct 02 '08 at 20:37
  • 1
    Andrea: OP means "original poster", so, I was referring to you :) – Dan Oct 02 '08 at 20:58
  • Dan: ok i did not learn the SO slang yet :P Thank you :) The same thing is commented on the Vinko Vrsalovic answer – Andrea Ambu Oct 02 '08 at 21:08
  • I'm guessing the {6} is set to 6 because that is the length of the string "Andrea", but if this _is_ the case, it should be made clear in the answer. – Shabbyrobe May 24 '11 at 01:07
  • This only works for strings that are exactly 6 characters long, as requested. Dmytro shared the answer for any length strings [here](http://stackoverflow.com/a/1909960/819417). – Cees Timmerman Jun 20 '13 at 15:24
61

For Python/Java,

^(.(?!(some text)))*$

http://www.lisnichenko.com/articles/javapython-inverse-regex.html

Rahul
  • 502
  • 7
  • 16
Dmytro
  • 627
  • 5
  • 2
  • 6
    This doesn't work. You're thinking of the Tempered Greedy Token idiom. but the dot has to go *after* the lookahead, not before. See [this question](http://stackoverflow.com/questions/30900794/tempered-greedy-token-what-is-different-about-placing-the-dot-before-the-negat). But that approach is overkill for this task anyway. – Alan Moore Aug 09 '16 at 09:42
  • Don't know which language it is written in, but worked like a charm in Sublime text to clean up my test data. Thanks! – Matthias dirickx May 04 '17 at 11:21
  • 1
    @AlanMoore Actually, it'll _almost_ work for this use case. However, if `some text` starts the line, it will return the wrong result. – Zenexer Aug 12 '17 at 08:35
  • 2
    @Zenexer, that's what I meant. If the dot is after the lookahead instead of before, it works perfectly. – Alan Moore Aug 14 '17 at 18:09
  • Here is a [link](https://superuser.com/a/1334072/411849) that explains more. I do not understand why `?!` and not just `!`. – Timo May 07 '19 at 09:10
  • See [Tempered Greedy Token - What is different about placing the dot before the negative lookahead](https://stackoverflow.com/a/37343088/3832970) to understand why this answer is wrong. – Wiktor Stribiżew Aug 27 '21 at 14:13
  • The link is broken: *"Unable to connect. Firefox can’t establish a connection to the server at www.lisnichenko.com."* – Peter Mortensen Oct 08 '21 at 17:50
45

In PCRE and similar variants, you can actually create a regex that matches any line not containing a value:

^(?:(?!Andrea).)*$

This is called a tempered greedy token. The downside is that it doesn't perform well.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Zenexer
  • 18,788
  • 9
  • 71
  • 77
  • 2
    This is the Tempered Greedy Token in long form. Just put the dot (or `[\s\S]`, which is only useful in JavaScript) after the second lookahead, and you don't need the first one: `^(?:(?!Andrea).)*$`. – Alan Moore Aug 09 '16 at 10:00
  • 1
    @AlanMoore Nice! I couldn't find any established pattern that worked like that, so I came up with my own. Rather than me taking your answer, you should provide that as your own. – Zenexer Aug 23 '16 at 06:11
  • 1
    That's okay, there are already plenty of good answers. And you deserve credit for inventing the idiom on your own. Cheers! – Alan Moore Aug 23 '16 at 13:57
  • Why do you suggest using `[\S\s]`? OP is talking about matching lines, not containing "Andrea" word. Not about checking if the whole string contains this word. Am I missing something? – x-yuri Jul 29 '17 at 06:37
  • @x-yuri I think you're right. I probably answered the question I had was I first visited this page, ignoring the discrepancy. My connection isn't good enough to update the answer right now, though (< 10 kbps) – Zenexer Jul 29 '17 at 07:04
  • Okay, undeleting this and making an attempt at cleaning this up. – Zenexer Aug 12 '17 at 08:30
  • Thank you! This answer was actually handy, because I was struggling with understanding how Black formatter handles the pattern provided to `--exlude`. I was able to set it up to ignore everything except files in few directories. – vintprox Nov 21 '21 at 11:15
  • Worked in search/replace visual code feature – JRichardsz Aug 02 '23 at 15:41
13

The capabilities and syntax of the regex implementation matter.

You could use look-ahead. Using Python as an example,

import re

not_andrea = re.compile('(?!Andrea)\w{6}', re.IGNORECASE)

To break that down:

(?!Andrea) means 'match if the next 6 characters are not "Andrea"'; if so then

\w means a "word character" - alphanumeric characters. This is equivalent to the class [a-zA-Z0-9_]

\w{6} means exactly six word characters.

re.IGNORECASE means that you will exclude "Andrea", "andrea", "ANDREA" ...

Another way is to use your program logic - use all lines not matching Andrea and put them through a second regex to check for six characters. Or first check for at least six word characters, and then check that it does not match Andrea.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Hamish Downer
  • 16,603
  • 16
  • 90
  • 84
8

Negative lookahead assertion

(?!Andrea)

This is not exactly an inverted match, but it's the best you can directly do with regex. Not all platforms support them though.

Vinko Vrsalovic
  • 330,807
  • 53
  • 334
  • 373
6

If you want to do this in RegexBuddy, there are two ways to get a list of all lines not matching a regex.

On the toolbar on the Test panel, set the test scope to "Line by line". When you do that, an item List All Lines without Matches will appear under the List All button on the same toolbar. (If you don't see the List All button, click the Match button in the main toolbar.)

On the GREP panel, you can turn on the "line-based" and the "invert results" checkboxes to get a list of non-matching lines in the files you're grepping through.

Jan Goyvaerts
  • 21,379
  • 7
  • 60
  • 72
5

(?! is useful in practice. Although strictly speaking, looking ahead is not a regular expression as defined mathematically.

You can write an inverted regular expression manually.

Here is a program to calculate the result automatically. Its result is machine generated, which is usually much more complex than hand writing one. But the result works.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
weakish
  • 28,682
  • 5
  • 48
  • 60
5

I just came up with this method which may be hardware intensive but it is working:

You can replace all characters which match the regex by an empty string.

This is a oneliner:

notMatched = re.sub(regex, "", string)

I used this because I was forced to use a very complex regex and couldn't figure out how to invert every part of it within a reasonable amount of time.

This will only return you the string result, not any match objects!

Matthias Herrmann
  • 2,650
  • 5
  • 32
  • 66
3

If you have the possibility to do two regex matches for the inverse and join them together you can use two capturing groups to first capture everything before your regex

^((?!yourRegex).)*

and then capture everything behind your regex

(?<=yourRegex).*

This works for most regexes. One problem I discovered was when I had a quantifier like {2,4} at the end. Then you gotta get creative.

Dodo
  • 137
  • 5
-4

In Perl you can do:

process($line) if ($line =~ !/Andrea/);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
phreakre
  • 177
  • 1
  • 1
  • 7