2

I want to use my id_file to search my big_file extracting lines that match the id at the beginning of the line in big_file.

I'm a beginner and I'm struggling with grep (version grep (BSD grep) 2.5.1-FreeBSD) and understanding the solutions as cited below.

My id_file contains id's:

67b
84D
118
136
166

My big_file looks something like this:

118 ABL1_BCR
118 AC005258
166 HSP90AB1
166 IKZF2_SP
166 IL1RAP_D
136 ABL1_BCR
136 ABL1_BCR
555 BCR_136
555 BCR_136
555 BCR_136
59  UNC45B_M 166
59  WASF2_GN 166
59  YPEL5_CX 166

As suggested by Chris Seymour here

Try 1: I used

grep -wFf id_file big_file

That didn't work obviously, as the numbers occur elsewhere in the lines of the big_file.

Try 2: I modified the id_file;

^67b
^84D
^118
^136
^166

And ran grep -wFf id_file big_file again.

Of course, that didn't work either

I looked at batimar's take here but I'm failing to implement the suggestion.

Better usage is taking only some patterns from some file and this patterns use for your file

grep '^PAT' patterns.txt | grep -f - myfile

This will take all patterns from file patterns.txt starting with PAT and use this patterns from the next grep to search in myfile.

I tried to reproduce the code above with my example in several ways but apparently I just don't get what they mean there as none of it worked.

There were 2 outcomes to my tinkering 1: No such file or directory or no output at all.

Is there even a way to do this with grep only?

I'd greatly appreciate if anyone was able to break it down for me.

ilam engl
  • 1,310
  • 1
  • 9
  • 22
  • Your second try should work, perhaps you used it wrongly? Try `grep -f id_file large_file`. Also, see https://stackoverflow.com/questions/68403784/how-to-find-all-lines-which-contain-at-least-one-of-a-set-of-words-as-a-prefix for adding `^` dynamically instead of manually – Sundeep Jul 16 '21 at 09:56
  • @ Sudeep, try 2 returns all lines that match the pattern, while I am looking for output that matches the pattern at the beginning of the line. – ilam engl Jul 16 '21 at 10:07
  • @ilamengl Sure, the scope of your question does not involve anything that could vary by presence of type of the shell underlying. E.g. `bash`. The problem as I see is with usage and the flags associated with `grep` – Inian Jul 16 '21 at 10:12
  • @ilamengl no, it will return only matches from beginning of the line. Try this: `printf '^84D\n^136\n' > f1` followed by `printf '84D2\na84De\n13663\na136ds\n' > f2` and then `grep -f f1 f2` – Sundeep Jul 16 '21 at 10:17
  • @Sundeep I reran try 2 twice to be sure I'm not missing anything but the output contains lines starting with 99, 202 and many other numbers an the are present in other fields of the file.. – ilam engl Jul 16 '21 at 10:27
  • `grep -f id_file large_file` gives me only first 7 lines, which is what you want.. make sure `id_file` is from try 2 which has `^` at the beginning – Sundeep Jul 16 '21 at 10:50
  • @Sundeep it is the file containing the ^ at the beginning the output includes id 59 which also has 166 in the last field. Tripple checked. – ilam engl Jul 16 '21 at 11:04
  • 1
    Can't think of anything that could cause that issue for you, other than perhaps weird issues like your grep command is aliased to something else, your input files have CRLF line ending, etc. In any case, if you want to match exactly first field instead of matching beginning, try `awk 'NR==FNR{a[$1]; next} $1 in a' id_file large_file` (in this case id_file should not have the `^` character) – Sundeep Jul 16 '21 at 11:10
  • What is your `grep` version? I tested on `GNU grep`, perhaps you have a different `grep` which has some bug? For ex: https://unix.stackexchange.com/questions/352977/why-does-this-bsd-grep-result-differ-from-gnu-grep – Sundeep Jul 16 '21 at 12:48
  • @Sundeep that must be the issue `grep -V` `grep (BSD grep) 2.5.1-FreeBSD`. Your `awk` example worked! – ilam engl Jul 16 '21 at 13:31

2 Answers2

1

This seems to be an issue with BSD grep. See https://unix.stackexchange.com/questions/352977/why-does-this-bsd-grep-result-differ-from-gnu-grep for similar issues.

You can use awk as an alternate (there's probably a duplicate somewhere with this exact solution):

awk 'NR==FNR{a[$1]; next} $1 in a' id_file large_file
  • NR==FNR{a[$1]; next} builds an associative array with first field of id_file as keys
  • $1 in a will be true if first field of a line from large_file matches any of the keys in array a. If so, entire line will be printed.
Sundeep
  • 23,246
  • 2
  • 28
  • 103
1

Using the id_file as described in the OP "Try 2"

^67b
^84D
^118
^136
^166

Then try this:

fname="id_file”; lines=$(cat $fname); for line in $lines; do grep $line big_file >> filtered_output; done
ilam engl
  • 1,310
  • 1
  • 9
  • 22