File searching and proximity search

Question

I've been looking at grep a file, but show several surrounding lines?

I'm using a bash terminal, looking for a file which

Has both path and redirect on any one line
Has flash on a nearby line, within five lines from the first

In this possible with grep, ag, Perl, sed or any tool you guys know of?

+/- 5 lines is close.... i'm hoping whatever the tool accepts a parameter that I can specify for closeness — american-ninja-warrior, Jan 18 '17 at 16:06
the word "path" and "redirect" appears in the same line, and the word "flash" appears in a "close by" line — american-ninja-warrior, Jan 18 '17 at 16:12
Is the data structured in any way? E.g. is it parsable as JSON, YAML or XML? — Sobrique, Jan 18 '17 at 16:13
for the purpose of this discussion let's say its a unstructured unparsable verbose log file — american-ninja-warrior, Jan 18 '17 at 16:14
Yes, this is trivial to do but [edit] your question to include concise, testable sample input and expected output so you don't end up with a solution to a problem you don't have (or a really bad "solution" to a problem you do have!). As of now you haven't even told us what your output should be ( file names? blocks of lines from matching files? Something else?) never mind shown us an example. Read [ask] and provide the missing [mcve] including concise, testable sample input and expected output plus what you've tried so far. — Ed Morton, Jan 18 '17 at 16:24

score 1 · Accepted Answer · edited May 23 '17 at 12:24

The easier filter is the one with "flash". It is also good to do it first, so that the more expensive pattern matching is done in the subset of matched files.

For this, just say:

grep -RH -C 5 "text" *

This will recursively (-R) look for the pattern "text" and print the name of the file (-H) when this happens. Also, it will print the surrounding 5 lines (-C 5). Just change 5 with a variable if you want so.

Then it is time to use awk to check two patterns with:

awk '/pattern1/ && /pattern2/ {print FILENAME}' file

This is useful as awk is quite good on multiple patterns matching.

Since we do not have the filename but a stream on the form filename:flash, etc, we can have a basic Bash loop to handle the result from grep:

while IFS=":" read -r filename data;
do
    awk -v f="$filename" '/path/ && /redirect/ {print f}' <<< "$data"
done < <(grep -RH -C5 "text" *)

Does this end up having to read the file twice? First to grep, second to awk? — Sobrique, Jan 18 '17 at 16:27
@Sobrique Not exactly. It does two passes, but the second pass only searches the 10 lines surrounding each instance of the word "text". This would be pretty inefficient if "text" appeared on every single line, so obviously the assumption is that it doesn't appear frequently. — ThisSuitIsBlackNot, Jan 18 '17 at 16:53

score 1 · Answer 2 · answered Jan 18 '17 at 16:29

ack -A6 -B4 'path.*redirect|redirect.*path' FILES | grep flash

outputs lines that contain the pattern flash in the 4 lines before or 6 lines after the lines in the files FILES that contain the patterns path and redirect along with the filename and the line number of the line containing flash.

Without the ack command (or the egrep command, which will also work), you can rephrase this as two grep commands

(grep -A6 -B4 'path.*redirect' FILES ; grep -A6 -B4 'redirect.*path' FILES) |
    grep flash

score 0 · Answer 3 · answered Jan 18 '17 at 16:24

This is a bit more complicated than it seems, because you're looking for words in rough proximity.

So I'd probably tackle it a bit like this:

#!/usr/bin/env perl

use strict;
use warnings;

my $buffer_limit = 5; # +/- 5

my @buffer; 

my $first_flag;
my $second_flag; 

#iterate stdin or files specified on command line
while ( my $line = <> ) {

   #test first condition
   if ( $line =~ m/path/ and $line =~ m/redirect/ ) { $first_flag++; };
   #test second condition
   if ( $line =~ m/flash/ ) { $second_flag++; };

   #if either is true - match has been seen recently. 
   #save the line in the buffer. 
   if ( $first_flag or $second_flag ) { 
         push @buffer, $line
   }
   #if both are true, we print (and flush the buffer)
   if ( $first_flag and $second_flag ) { 
       print "Match found up to line $.:\n";
       print @buffer;
       @buffer = ();
       $first_flag = 0;
       $second_flag = 0; 
   }
   #exceeding limit means that both matches haven't been seen in proximity. 
   if ( @buffer > $buffer_limit ) { 
      @buffer = ();
      $first_flag = 0;
      $second_flag = 0;
   }
}

We use a rolling 5 line buffer. We start capturing when we hit one or other 'match' and we print/flush if we hit the second match. And then empty the buffer if we exceed 5 lines.

I take it you don't like statement modifiers? Also, you're saving lines in `@buffer` just to count them (the OP just wants a pass/fail on each file) so why not set your two flags to `$buffer_limit` and decrement them on each line if they're non-zero — Borodin, Jan 18 '17 at 19:54

File searching and proximity search

3 Answers3