133

I am attempting to grep for all instances of Ui\. not followed by Line or even just the letter L

What is the proper way to write a regex for finding all instances of a particular string NOT followed by another string?

Using lookaheads

grep "Ui\.(?!L)" *
bash: !L: event not found


grep "Ui\.(?!(Line))" *
nothing
Lee Quarella
  • 4,662
  • 5
  • 43
  • 68
  • 5
    Which sub-species of regex - PCRE, ERE, BRE, grep, ed, sed, perl, python, Java, C, ...? – Jonathan Leffler Feb 08 '12 at 16:53
  • 5
    As an aside, the "event not found" comes from using history expansion. You might want to turn off history expansion if you never use it, and sometimes want to be able to use an exclamation mark in your interactive commands. `set +o histexpand` in Bash or `set +H`, YMMV. – tripleee Feb 08 '12 at 19:23
  • 15
    I also had the history expansion issue. I *think* I solved it simply by switching to single quotes, so the shell wouldn't try to munge the argument. – Coderer Sep 17 '12 at 08:55
  • @Coderer Using a single quote is all very well - except for when you want other (most commonly, $) metacharacters to be active. '' protect all metachars from the shell, which is only occasionally what you want. tripleee's comment is the best way of handling this, IMO. – Graham Nicholls Nov 26 '21 at 15:37
  • How often do you want other metacharacters to be active *inside* a regexp, though? If you're building your regexp dynamically by splatting in an environment variable or something, you're probably doing it wrong. – Coderer Nov 29 '21 at 13:59

6 Answers6

193

Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep. You need a PCRE-enabled grep.

If you have GNU grep, the current version supports options -P or --perl-regexp and you can then use the regex you wanted.

If you don't have (a sufficiently recent version of) GNU grep, then consider getting ack.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 44
    I am pretty sure the problem in this case is just that in bash you should use single quotes not double quotes so it won't treat `!` as a special character. – NHDaly Sep 18 '13 at 21:26
  • (see below for my answer describing exactly that.) – NHDaly May 11 '14 at 16:14
  • 4
    Verified, correct answer should be combining this answer and @NHDaly's comment. For example, this command works for me: **grep -P '^.*contains((?!but_not_this).)*$' \*.log.* >"D:\temp\result.out"** – wangf May 29 '15 at 02:47
  • 3
    For those where `-P` is not supported try piping result again to `grep --invert-match`, ex: `git log --diff-filter=D --summary | grep -E 'delete.*? src' | grep -E --invert-match 'xml'`. Make sure to upvote @Vinicius Ottoni's answer. – Daniel Sokolowski Nov 17 '15 at 20:37
  • @wangf I'm using Bash under Cygwin and when I change to single quotes, I'm still getting the error "event not found". – SSilk Jun 23 '17 at 13:44
  • @SSilk: can you show the command you type? If you're using Bash, the single quotes around the `!` should fix the 'event not found' problem. Or you can turn the notation off: `set +H` disables it (since Bash documents that the [`set` builtin](https://www.gnu.org/software/bash/manual/bash.html#The-Set-Builtin) uses `-H` to enable it). – Jonathan Leffler Jun 23 '17 at 13:49
  • @SrikanthSharma — that’s to be expected. Mojave uses BSD grep 2.5.1 and not GNU grep, as you’d see if you ran `grep —version`. – Jonathan Leffler Sep 18 '19 at 15:13
47

The answer to part of your problem is here, and ack would behave the same way: Ack & negative lookahead giving errors

You are using double-quotes for grep, which permits bash to "interpret ! as history expand command."

You need to wrap your pattern in SINGLE-QUOTES: grep 'Ui\.(?!L)' *

However, see @JonathanLeffler's answer to address the issues with negative lookaheads in standard grep!

Community
  • 1
  • 1
NHDaly
  • 7,390
  • 4
  • 40
  • 45
  • You are confusing the extension functionality of GNU `grep` with the functionality of standard `grep`, where the standard for [`grep`](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html) is POSIX. What you say is also true — I run Bash with the C-shell barbarisms disabled (because if I wanted a C shell, I'd use one, but I don't want one), so the `!` stuff doesn't affect me — but to get negative lookaheads, you need non-standard `grep`. – Jonathan Leffler May 11 '14 at 16:22
  • 1
    @JonathanLeffler, thanks for the clarification; I think you are right that it requires both of our answers to address all of the OP's symptoms. Thanks. – NHDaly May 12 '14 at 19:49
  • 1
    By using `-E` option with this negative lookahead, it gives `grep: repetition-operator operand invalid` :( – Jerry Green Nov 07 '20 at 10:56
14

You probably cant perform standard negative lookaheads using grep, but usually you should be able to get equivalent behaviour using the "inverse" switch '-v'. Using that you can construct a regex for the complement of what you want to match and then pipe it through 2 greps.

For the regex in question you might do something like

grep 'Ui\.' * | grep -v 'Ui\.L'

(Edit: this is not as strong as a true lookahead, but can often be used to work around the problem.)

Karel Tucek
  • 355
  • 3
  • 9
  • 1
    That would exclude more things, more instance if the the line contains Ui.Line and Ui without .Line – nafg Jul 23 '17 at 05:15
  • 1
    (Yes, that's why i do not formulate it strictly. This simply solves significant portion of scenarios which navigate people to this problem, nothing more.) – Karel Tucek Aug 20 '17 at 19:14
  • This answer inspired my final solution which was to use `sed` (available on busybox/alpine) to replace the matched parts with nothing, ie. `grep 'match.+' | sed 's/match//'` – lionello Feb 16 '23 at 19:54
7

If you need to use a regex implementation that doesn't support negative lookaheads and you don't mind matching extra character(s)*, then you can use negated character classes [^L], alternation |, and the end of string anchor $.

In your case grep 'Ui\.\([^L]\|$\)' * does the job.

  • Ui\. matches the string you're interested in

  • \([^L]\|$\) matches any single character other than L or it matches the end of the line: [^L] or $.

If you want to exclude more than just one character, then you just need to throw more alternation and negation at it. To find a not followed by bc:

grep 'a\(\([^b]\|$\)\|\(b\([^c]\|$\)\)\)' *

Which is either (a followed by not b or followed by the end of the line: a then [^b] or $) or (a followed by b which is either followed by not c or is followed by the end of the line: a then b, then [^c] or $.

This kind of expression gets to be pretty unwieldy and error prone with even a short string. You could write something to generate the expressions for you, but it'd probably be easier to just use a regex implementation that supports negative lookaheads.

*If your implementation supports non-capturing groups then you can avoid capturing extra characters.

dougcosine
  • 161
  • 1
  • 10
3

If your grep doesn't support -P or --perl-regexp, and you can install PCRE-enabled grep, e.g. "pcregrep", than it won't need any command-line options like GNU grep to accept Perl-compatible regular expressions, you just run

pcregrep "Ui\.(?!Line)"

You don't need another nested group for "Line" as in your example "Ui.(?!(Line))" -- the outer group is sufficient, like I've shown above.

Let me give you another example of looking negative assertions: when you have list of lines, returned by "ipset", each line showing number of packets in a middle of the line, and you don't need lines with zero packets, you just run:

ipset list | pcregrep "packets(?! 0 )"

If you like perl-compatible regular expressions and have perl but don't have pcregrep or your grep doesn't support --perl-regexp, you can you one-line perl scripts that work the same way like grep:

perl -e "while (<>) {if (/Ui\.(?!Lines)/){print;};}"

Perl accepts stdin the same way like grep, e.g.

ipset list | perl -e "while (<>) {if (/packets(?! 0 )/){print;};}"
Maxim Masiutin
  • 3,991
  • 4
  • 55
  • 72
3

At least for the case of not wanting an 'L' character after the "Ui." you don't really need PCRE.

    grep -E 'Ui\.($|[^L])' *

Here I've made sure to match the special case of the "Ui." at the end of the line.