2

I have a file with the following pattern.

Foo $var1
.........
.........

Foo $var2 
..........
..........
..........
Yes

I would only like to match the "section" which starts with "Foo" and has "Yes". (You will notice there is an empty line feed at the end of each section)

The expected output should be.

Foo $var2 
..........
..........
..........
Yes

I tried

pcregrep -M "^Foo(.|\n)*^Yes"

But unfortunately this starts matching from the previous section and lumps the penultimate section together with the the section that has the "Yes" as the returned match, so I don't get one section that starts with "Foo" and has "Yes" but as many sections as before it that started with "Foo"

My dilemma is how to discard the previous match if at the end of the section I could not see "Yes" though I matched "Foo".

I tried to use the lookbehind function but it cannot be used for variable lengths.

dochenaj
  • 33
  • 6
  • Please add your desired output (no description) for that sample input to your question (no comment). – Cyrus Sep 13 '19 at 20:01

3 Answers3

1

You could use match Foo from the start of the string followed by matching all lines that do not start with either Yes or Foo.

If Foo and Yes should not be part of a larger word you could use a word boundary \b

^Foo\b.*(?:\n(?!Yes\b|Foo\b).*)*\nYes\b

In parts

  • ^ Start of string
  • Foo\b.* Match Foo followed by 0+ times any char except a newline
  • (?: Non capturing group
    • \n Match newline
    • (?!Yes\b|Foo\b) Negative lookahead, assert not Yes or Foo directly on the right
    • .* Match any char 0+ times except a newline
  • )* Close group and repeat 0+ times
  • \nYes\b

Regex demo

For example

pcregrep -Mo '^Foo\b.*(?:\n(?!Yes\b|Foo\b).*)*\nYes\b' file

Output

Foo $var2
..........
..........
..........
Yes
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    This works. But I had to flesh out the word boundaries. Finally reduced it to, pcregrep -Mo '^Foo.*(?:\n(?!Foo).*)*\nYes'. – dochenaj Sep 14 '19 at 14:31
0

If you can use gnu awk instead, you can make awk work in block mode like this:

awk -v RS='Foo' -v ORS='' '/Yes/ {print RS$0}' file
Foo $var2
..........
..........
..........
Yes
Jotne
  • 40,548
  • 12
  • 51
  • 55
-1

If you're using PCRE enabled grep, something like this will get
just those Foo's with a YES

Note that I am not sure if a grep is going to span lines.
Probably, but I don't know personally.

(?m)^Foo\K(?:(?!^Foo)[\S\s])+(?=^Yes)

https://regex101.com/r/HCrcGO/1

Expanded

 (?m)
 ^ Foo
 \K 
 (?:
      (?! ^ Foo )
      [\S\s] 
 )+
 (?= ^ Yes )
  • How do I adapt that pattern to pcregrep? – dochenaj Sep 14 '19 at 00:22
  • @henrychiedozie - I'm not sure, other than it works under PCRE. Maybe just give it a try ? –  Sep 14 '19 at 00:46
  • It doesn't work. It shows this error. pcregrep: Error in command-line regex at offset 11: unrecognized character after (? or (?- – dochenaj Sep 14 '19 at 02:53
  • But I suppose the way to make it work is to combine a positive lookahead and a negative lookahead. But the pattern to get it to work is what I'm still trying to figure out. – dochenaj Sep 14 '19 at 02:59
  • @dochenaj - Here is some info on the whole _PCREGREP_ thing, have a nice day ! https://www.pcre.org/original/doc/html/pcregrep.html **_OPTION SETTING_** - `(?m) sets multiline matching. The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and PCRE_EXTENDED options (which are Perl-compatible) can be changed from within the pattern by a sequence of Perl option letters enclosed between "(?" and ")"` **_MATCH POINT RESET_** - `\K Reset start of match` –  Sep 14 '19 at 17:11
  • @dochenaj - Don't think I left you in the dark. I answered all your questions but SO deleted them all. It was 2 or 3 of them. Sorry buddy, you picked the wrong answer. I downvoted your question and the answer you chose... –  Sep 15 '19 at 21:56