9

I have a a requirement to grep patterns from a file but need them in order.

$ cat patt.grep
name1
name2

$ grep -f patt.grep myfile.log
name2:some xxxxxxxxxx
name1:some xxxxxxxxxx

I am getting the output as name2 was found first it was printed then name1 is found it is also printed. But my requirement is to get the name1 first as per the order of patt.grep file.

I am expecting the output as

name1:some xxxxxxxxxx
name2:some xxxxxxxxxx
Sriharsha Kalluru
  • 1,743
  • 3
  • 21
  • 27
  • 1
    @devnull probably the patterns in patt.grep are unsorted, and the OP wants them sorted in the order in patt.grep. Unfortunately the example is likely misleading. – mockinterface Feb 20 '14 at 13:13
  • @mockinterface but it seems to be sorted like see the output of `$ cat patt.grep` – Jayesh Bhoi Feb 20 '14 at 13:16
  • @JKB I bet that it is unsorted. Call it a hunch developed after years and years of reading the minds of requirement writers. Will go to sleep and find out in the morning :) – mockinterface Feb 20 '14 at 13:20
  • @Sriharsha Kalluru Try `$ grep -f patt.grep myfile.log | sort -u` – Jayesh Bhoi Feb 20 '14 at 13:21
  • sort -u will apply for my script but my requirement is get order as per the list of patt.grep file. – Sriharsha Kalluru Feb 20 '14 at 13:26
  • look http://stackoverflow.com/questions/9936962/obtain-patterns-in-one-file-from-another-using-ack-or-awk-or-better-way-than-gre – Jayesh Bhoi Feb 20 '14 at 13:58
  • 2
    The output of this command is *in the order the lines appear in `myfile.log`*, not the order of the patterns in `patt.grep`. `name2` occurs in `myfile.log` before `name1` does. `grep` walks through the file-to-be-searched one line at a time and compares each line to all patterns. If you want it in pattern order, then you'll have to run `grep` repeatedly, once for each pattern. – twalberg Feb 20 '14 at 15:53

6 Answers6

4

You can pipe patt.grep to xargs, which will pass the patterns to grep one at a time.

By default xargs appends arguments at the end of the command. But in this case, grep needs myfile.log to be the last argument. So use the -I{} option to tell xargs to replace {} with the arguments.

cat patt.grep | xargs -Ihello grep hello myfile.log
owlman
  • 141
  • 1
  • 6
  • If invoking `grep` for every line in `patt.grep` is tolerable in terms of performance, this is a simple and pragmatic solution. (I suggest using `{}` or something similarly abstract as the placeholder, though, to avoid confusion.) – mklement0 Apr 07 '15 at 01:58
1

Use the regexes in patt.grep one after another in order of appearance by reading line-wise:

while read ptn; do grep $ptn myfile.log; done < patt.grep
J. Katzwinkel
  • 1,923
  • 16
  • 22
1

A simple workaround would be to sort the log file before grep:

grep -f patt.grep <(sort -t: myfile.log)

However, this might not yield results in the desired order if patt.grep is not sorted.

In order to preserve the order specified in the pattern file, you might use awk instead:

awk -F: 'NR==FNR{a[$0];next}$1 in a' patt.grep myfile.log
devnull
  • 118,548
  • 33
  • 236
  • 227
  • But i think OP wants order as per the list of `patt.grep` file. – Jayesh Bhoi Feb 21 '14 at 08:30
  • 1
    @JKB Yes, the `awk` solution does preserve the order as in the pattern file. – devnull Feb 22 '14 at 20:41
  • Actually, the `awk` solution preserves the order as in the _log_ (input) file, not the pattern file; it is effectively the same as the OP's original command, `grep -f patt.grep myfile.log`. – mklement0 May 21 '15 at 03:44
1

i tried the same situation and easily solved using below command:

I think if your data in the same format as you represent then you can use this.

grep -f patt.grep myfile.log | sort

enter image description here

Tajinder
  • 2,248
  • 4
  • 33
  • 54
1

This should do it

awk -F":" 'NR==FNR{a[$1]=$0;next}{ if ($1 in a) {print a[$0]} else {print $1, $1} }' myfile.log patt.grep > z

  • While this snippet may answer the question, we typically prefer some additional explanation as to how or why something is so. Can you provide an explanation? – Joshua Drake May 17 '17 at 19:21
  • Hi, sorry, so: first, the myfile.log is split into columns using -F":". Then I load the content into a with a[$1]=$0. Then I say if a word listed in the first (and only) column of patt.grep is present in the first column of a (which is essentially created by splitting in with -F":" and contains name2, name1 in this order), then I say print the entire line, otherwise print the missing word twice. So if you add name3 into patt.grep, the output is: name1:some xxxxxxxxxx name2:some xxxxxxxxxx name3 name3 – Isidor Lipsch May 24 '17 at 02:16
0

This can't be done in grep alone.

For a simple and pragmatic, but inefficient solution, see owlman's answer. It invokes grep once for each pattern in patt.grep.

If that's not an option, consider the following approach:

grep -f patt.grep myfile.log |
 awk -F: 'NR==FNR { l[$1]=$0; next } $1 in l {print l[$1]}' - patt.grep
  • Passes all patterns to grep in a single pass,
  • then sorts them based on the order of patterns in patt.grep using awk:
    • first reads all output lines (passed via stdin, -, i.e., through the pipe) into an assoc. array using the 1st :-based field as the key
    • then loops over the lines of patt.grep and prints the corresponding output line, if any.

Constraints:

  • Assumes that all patterns in patt.grep match the 1st :-based token in the log file, as implied by the sample output data in the question.
  • Assumes that each pattern only matches once - if multiple matches are possible, the awk solution would have to be made more sophisticated.
Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775