5

I need (case-insensitive) all matches of several variations on a word--except one--including unknowns.

I want

accept
acceptance
acceptable
accepting

...but not "acception." A coworker used it when he meant "exception." A lot.

Since I can't anticipate the variations (or typos), I need to allow things like "acceptjunk" and "acceptMacarena"

I thought I could accomplish this with a negative lookahead, but this didn't give the results I needed

grep -iE '(?!acception)(accept[a-zA-Z]*)[[:space:]]' file

The trick is that I can accept (har) lines that contain "acception," provided that the other words match. For example this line is okay to match:

The acceptance of the inevitable is the acception.

...otherwise by now I'd have piped grep through grep -v and been done with it:

grep -iE '(accept)[a-zA-Z]*[[:space:]]' | grep -vi 'acception'

I've found some questions that are similar and many that are not quite so. Using a-zA-Z is likely unnecessary in grep -i but I'm flailing. I'm probably missing something small or basic...but I'm missing it nonetheless. What is it?

Thanks for reading.

PS: I'm not married to grep--but I am operating in bash--so if there's a magic awk command that would do this I'm all ears (eyes).

PPS: forgot to mention that on https://regex101.com/ the above lookahead seemed to work, but it doesn't with my full grep command.

zedmelon
  • 373
  • 2
  • 10
  • 1
    `missing something small or basic` grep doesn't support lookarounds... unless you have GNU grep with -P support... – Sundeep Feb 26 '18 at 04:05
  • Precisely. I'm the guy in the movie who dies just before the hero makes it out alive. Thanks Sundeep. – zedmelon Feb 26 '18 at 04:19

1 Answers1

6

To use lookarounds, you need GNU grep with PCRE available

grep -iP '(?!acception)(accept[a-z]*)[[:space:]]'


With awk, this might work

awk '{ip=$0; sub(/acception/, ""); if(/accept[a-zA-Z]*[[:space:]]/) print ip}'
  • ip=$0 save input line
  • sub(/acception/, "") remove unwanted words, can add other unwanted words with alternation
  • if(/accept[a-zA-Z]*[[:space:]]/) print ip then print the line if it still contains words being searched
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • 1
    Perfect; thank you! Changing E to P did it. What's lame is just a couple weeks ago I used PCRE for something else. It's short-term memory that wasn't quite committed yet to long-term knowledge. Now it has. :,/ – zedmelon Feb 26 '18 at 04:21
  • Is there a reason I wouldn't want to replace my habit and permanently switch from `grep -E` to always using `grep -P` instead? Other than "*some systems won't support PCRE*" I mean? – zedmelon Feb 26 '18 at 04:31
  • one reason I can think of is `grep -E` would be faster for many cases.. while `grep -P` would generally be faster if back-references are involved... then other things like `grep -P` not being portable if GNU grep or PCRE lib not available.. – Sundeep Feb 26 '18 at 04:35