1

I have lots of data in a directory, I want to find any instances of double words that aren't numbers. I started with this from here:

\b(\w+) \1\b

And expanded it to include what I don't want in the results:

(?!(?:one|two|three|four|five|six|seven|eight|nine|oh|zero))\b(\w+) \1\b

This works when I put it into regex101 as a python expression (since that's all i'm familiar with), but not when I use it in a grep command. I realized I can't use the !, so I tried this after reading this question:

 grep -Proh "\b(\w+) \1\b" | grep -Prohv "?(?:one|two|three|four|five|six|seven|eight|nine|oh|zero)"

Which returns "grep: nothing to repeat". I'm unsure if I am using the correct grep parameters, or what is wrong with the regex I am using.

Sample data to match:
today to evaluate for possibilities. doubt that that is occurring

Sample data to ignore:
specific gravity one point zero zero seven

zakparks31191
  • 919
  • 2
  • 21
  • 42
  • use with -P `\b(?!(?:eight|f(?:ive|our)|nine|o(?:h|ne)|s(?:even|ix)|t(?:hree|wo)|zero))(\w+) \1\b` –  Mar 11 '15 at 16:26

1 Answers1

2

Just -P or -oP would be enough.

$ grep -P '(?!(?:one|two|three|four|five|six|seven|eight|nine|oh|zero))\b(\w+) \1\b' file
today to evaluate for possibilities. doubt that that is occurring
$ grep -oP '(?!(?:one|two|three|four|five|six|seven|eight|nine|oh|zero))\b(\w+) \1\b' file
that that
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • That works! I do need the other paramaters to keep the data managable but adding them doesn't break anything. What ends up breaking things turns out to be single vs double quotes, comparing mine to yours. What's the difference? – zakparks31191 Mar 11 '15 at 15:34
  • i think the problem is mainly because of the additional `hv` parameters. – Avinash Raj Mar 11 '15 at 15:37