2

I have a bunch of files on a Linux machine. I want to find whether any of those files have the string foo123 bar, AND the string foo123 must not appear before that foo123 bar .

Plot twist: I want the search to do this for any number instead of "123", without me having to specify a specific number.

How can I do that?

Ram Rachum
  • 84,019
  • 84
  • 236
  • 374
  • Does "before" mean "immediately before"? – logi-kal May 21 '17 at 13:40
  • @horcrux No, at any point from the beginning of the file. – Ram Rachum May 21 '17 at 13:47
  • 2
    With almost 20k rep you should know that we like code, what have you tried so far? – Pedro Lobito May 21 '17 at 13:51
  • 1
    The regex should be `(?<!foo\d+.*)foo\d+ bar` but in grep you cannot use no-fixed-length negative lookbehind. – logi-kal May 21 '17 at 14:00
  • Please add the tools / programming language you're able to use. You could achieve it with an infinite lookbehind, supported by ie the newer `regex` module in `Python`. – Jan May 21 '17 at 15:37
  • See a solution with an [infinite lookbehind](http://regexstorm.net/tester?p=%28%3f%3c!foo123%28%3fs%3a.*%29%29foo123+bar%23%5cw%2b&i=I+have+a+bunch+of+files+on+a+Linux+machine.+I+want+to+find+whether+any+of+those+files+have+the+string+foo123+bar%2312%2c+AND+the+string+foo123+must+not+appear+before+that+foo123+bar%2334+.%0d%0a%0d%0aPlot+twist%3a+I+want+the+search+to+do+this+for+any+number+instead+of+%22123%22%2c+without+me+having+to+specify+a+specific+number.%0d%0a%0d%0aHow+can+I+do+that%3f) – Jan May 21 '17 at 16:00
  • You could reverse the string and then do a negative lookahead. – David Knipe May 21 '17 at 16:34
  • 1
    @Jan in grep you can specify the singleline mode as parameter. And the point remains the same: "infinite" lookbehind is not allowed. – logi-kal May 21 '17 at 17:56
  • @RamRachum I think you cannot do this within the shell, you have to use Jan's regular expression in a simple C# script. – logi-kal May 21 '17 at 17:57

2 Answers2

1

A solution with Python's newer regex module:

import regex as re

string = """
I have a bunch of files on a Linux machine. I want to find whether any of those files have the string foo123 bar#12, AND the string foo123 must not appear before that foo123 bar#34 .
Plot twist: I want the search to do this for any number instead of "123", without me having to specify a specific number.
How can I do that?
"""

rx = re.compile(r'(?<!foo\d(?s:.*))foo123 bar#\w+')

print(rx.findall(string))
# ['foo123 bar#12']

Making use of the infinite lookbehind and the single line mode ((?s:.*)).

Jan
  • 42,290
  • 8
  • 54
  • 79
  • This looks good, except one thing I probably didn't specify clearly enough: The number after the `foo` needs to be the same in both instances. (That was actually the part I really didn't know how to do.) – Ram Rachum May 22 '17 at 18:19
0

Well, that's a tricky one. Here's an imperfect solution:

grep . -Prle '(?s)(?<ref>foo\d+)\b(?! bar).*\k<ref>(*SKIP)(*FAIL)|foo\d+ bar'

Why is it imperfect? Because if you have a file containing foo123 foo456 bar foo123 bar, it won't detect the foo456 bar part. If this situation cannot happen in your set of files, then I suppose you're fine.

This makes use of the (*SKIP)(*FAIL) trick, once you learn that the rest of the pattern should be pretty clear.

So maybe plain regex isn't the best solution here, let's just write a one-liner script instead:

find . -type f -execdir perl -e 'while(<>) { while(/foo(\d+)( bar)?/g) { if ($2) { exit 0 if !$n{$1} } else { $n{$1} = 1 } } } exit 1;' {} \; -print

That one does the job and is hopefully more understandable :)

Community
  • 1
  • 1
Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158