0

I have one file, file1.txt, with data formatted like so:

EMERGE-3 16218877 0 0 2 -9
EMERGE-3 16230920 0 0 1 -9
EMERGE-8 16220003 0 0 1 -9
EMERGE-9 16231695 16220014 16220010 1 -9
EMERGE-11 16218001 0 0 1 -9

I have another file, file2.txt, with a list of IDs formatted like:

16230920
16220014
16218001
16218877

I would like to perform a grep search only on column 2 of file1.txt. So, the output of the search would be something like:

somecommand file1 file2

EMERGE-3 16230920 0 0 1 -9
EMERGE-11 16218001 0 0 1 -9
EMERGE-3 16218877 0 0 2 -9

(Notice the line EMERGE-9 16231695 16220014 16220010 1 -9 was not included in the output). This is the main issue I am having right now. If I perform the command:

grep -f file2.txt file1.txt

the output will include the line EMERGE-9 16231695 16220014 16220010 1 -9 because the ID 16220014 is in the 3rd column of file1.txt but I am trying to avoid including this line in the output - i.e I only want to search for the IDs in column 2 of file1.txt.

Levi Arista
  • 295
  • 1
  • 3
  • 13
jesseaam
  • 151
  • 2
  • 6
  • use awk... here's a resource (http://backreference.org/2010/02/10/idiomatic-awk/) mentioned in https://stackoverflow.com/tags/awk/info that'll help you for this case – Sundeep Apr 09 '18 at 15:43

1 Answers1

1

Following awk may help you on same.

awk 'FNR==NR{a[$0];next} ($2 in a)' file2.txt  file1.txt
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 2
    You have rather quickly accrued a fairly substantial amount of reputation here. Perhaps it would be time to start learning how to find suitable duplicates and why posting near-identical answers does not help improve the quality of this site. The DRY principle applies here, too. – tripleee Apr 09 '18 at 15:56
  • @tripleee, sure sir will try to do so. – RavinderSingh13 Apr 09 '18 at 16:00