How to grep multiples strings within N lines

Question

I was wondering if there is anyway that I could grep (or any other command) that will search multiple strings within N lines.

Example

Search for "orange", "lime", "banana" all within 3 lines

If the input file is

xxx
a lime
b orange
c banana
yyy
d lime
foo
e orange
f banana

I want to print the three lines starting with a, b, c. The lines with the searched strings can appear in any order.

I do not want to print the lines d, e, f, as there is a line in between, and so the three strings are not grouped together.

I felt free to propose an edit for your question. Does it make your question clearer? For the future, please provide clear sample input and use the available formatting options. — Martin Nyolt, Sep 13 '16 at 15:43
Must every string be matched exactly once? Or are three consecutive lines containing `banana` also a successful match? — Martin Nyolt, Sep 13 '16 at 15:44
Possible duplicate of [How to find patterns across multiple lines using grep?](http://stackoverflow.com/questions/2686147/how-to-find-patterns-across-multiple-lines-using-grep) — Krzysztof Kaszkowiak, Sep 13 '16 at 16:42
hi @MartinNyolt thanks for editing it. every string should be matched once, example: xxx banana banana banana is NOT a match but: xxx a banana, orange, lime . and yyy a banana b orange, lime IS a match. — Rafac13, Sep 13 '16 at 22:54
@KrzysztofKaszkowiak thanks for your suggestion but that is not quite what I wanted — Rafac13, Sep 13 '16 at 22:57
This is probably a duplicate, but you have not prescribed how to handle a number of corner cases, and your example is fuzzy, so it's hard to tell. If you are not satisfied with answers so far, see if you can [edit] your question to clarify it. — tripleee, Sep 14 '16 at 04:26

score 0 · Answer 1 · answered Sep 14 '16 at 04:22

Your question is rather unclear. Here is a simple Awk script which collects consecutive matching lines and prints iff the array is longer than three elements.

awk '/orange|lime|banana/ { a[++n] = $0; next }
    { if (n>=3) for (i=1; i<=n; i++) print a[i]; delete a; n=0 }
    END { if (n>=3) for (i=1; i<=n; i++) print a[i] }' file

It's not clear whether you require all of your expressions to match; this one doesn't attempt to. If you see three successive lines with orange, that's a match, and will be printed.

The logic should be straightforward. The array a collects matches, with n indexing into it. When we see a non-match, we check its length, and print if it's 3 or more, then start over with an empty array and index. This is (clumsily) repeated at end of file as well, in case the file ends with a match.

If you want to permit gap (so, if there are three successive lines where one matches "orange" and "banana", then one which doesn't match, then one which matches "lime", print those three lines? Your question is unclear) you could change to always keeping an array of the last three lines, though then you also need to specify how to deal with e.g. a sequence of five lines which matches by these rules.

score 0 · Answer 2 · answered Sep 14 '16 at 09:39

Similar to tripleee's answer, I would also use awk for this purpose. The main idea is to implement a simple state machine.

Simple example

As a simple example, first try to find three consecutive lines of banana. Consider the pattern-action statement

/banana/ { bananas++ }

For every line matching the regex banana, it increases the variable bananas (in awk, all variables are initialised with 0).

Of course, you want bananas to be reset to 0 when there is non-matching line, so your search starts from the beginning:

/banana/ { bananas++; next }
{ bananas = 0 }

You can also test for values of variables in the pattern of actions. For example, if you want to print "Found" after three lines containing banana, extend the rule:

/banana/ {
    bananas++
    if (bananas >= 3) {
        print "Found"
        bananas = 0
    }
    next
}

This resets the variable bananas to 0, and prints the string "Found".

How to proceed further

Using this basic idea, you should be able to write your own awk script that handles all the cases. First, you should familiarise yourself with awk (pattern, actions, program execution).

Then, extend and adapt my example to fit your needs.

In particular, you probably need an associative array matched, with indices "banana", "orange", "lime".
You set matched["banana"] = $0 when the current line matches /banana/. This saves the current line for later output.
You clear that whole array when the current line does not match any of your expressions.
When all strings are found (matched[s] is not empty for every string s), you can print the contents of matched[s].

I leave the actual implementation to you. As others have said, your description leaves many corner-cases unclear. You should figure them out for yourself and adapt your implementation accordingly.

score 0 · Answer 3 · answered Sep 14 '16 at 10:39

I think you want this:

awk '
  /banana/ {banana=3}
  /lime/   {lime=3}
  /orange/ {orange=3}
 (orange>0)&&(lime>0)&&(banana>0){print l2,l1,$0}
 {orange--;lime--;banana--;l2=l1;l1=$0}' OFS='\n' yourFile

So, if you see the word banana you set banana=3 so it is valid for the next 3 lines. Likewise, if you see lime, give it 3 lines of chances to make a group, and similarly for orange.

Now, if all of orange, lime and banana have been seen in the previous three lines, print the second to last line (l2), the last line (l1) and the current line $0.

Now decrement the counts for each fruit before we move to the next line, and save the current line and shuffle backwards in time order the previous 2 lines.

How to grep multiples strings within N lines

Example

3 Answers3

Simple example

How to proceed further