1

I want to compare 2nd column of file2 with 1st column of file1. If they are equal i want to add the 2nd column of file1 to file2 as shown in output.txt.

file2

chr5    ENST00000514151    utr5    0    +
chr5    ENST00000512281    utr5    0    +
chr5    ENST00000512281    utr5    0    +
chr5    ENST00000512281    utr5    0    +

file1

ENST00000512281    a
ENST00000504031    b
ENST00000776348    c

output.txt

chr5    a    ENST00000512281    utr5    0    +
chr5    a    ENST00000512281    utr5    0    +
chr5    a    ENST00000512281    utr5    0    +

I was able compare the files with

awk 'NR==FNR{a[$1];next}$2 in a{print}' file1 file2

This gives below output:

chr5    ENST00000512281    utr5    0    +
chr5    ENST00000512281    utr5    0    +
chr5    ENST00000512281    utr5    0    +

But I do not know how to add the 2nd colum of file1 into the output.

Thor
  • 45,082
  • 11
  • 119
  • 130
Marjer
  • 1,313
  • 6
  • 20
  • 31

1 Answers1

2

You can store the value of $2 in file1 into the array using a[$1]=$2. So you could try:

awk '
   NR==FNR{ 
     a[$1]=$2 ; next }
   $2 in a {
     $1=$1 FS a[$2]
     print 
   }' file1 file2

Output:

chr5 b ENST00000504031 utr5 0 +
chr5 b ENST00000504031 utr5 0 +
chr5 a ENST00000512281 utr5 0 +
chr5 a ENST00000512281 utr5 0 +
chr5 a ENST00000512281 utr5 0 +

Explanation:

  • This modifies $1 in file2 using $1=$1 FS a[$2] where FS is the default field separator, which is a space.. and then rebuilds the record, such that it can be printed by print later..
  • The print can be simplified to a 1 if desired.. Like $2 in a { $1=$1 FS a[$2] }1
  • Note that this rebuilds the record in file2 and thus any sequences of spaces or tabs will be truncated to a single space in the output. To keep the original formatting in file2 one could use the split() function in Gnu Awk version 4..
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • Hi @Håkon Hægland, I'm getting a error while executing `awk 'NR==FNR{a[$1]=$2;next}$2 in a {$1=$1 FS a[$2] print }' file1 file2` it's not working – Marjer Jan 27 '14 at 00:11
  • Thanks @Håkon Hægland,what needs to be done if i want to print the 2nd column of file 1 in output.txt as third column. Just curious – Marjer Jan 27 '14 at 00:27
  • You are welcome. I think you can just replace `$1` with `$2` in `$1=$1 FS a[$2]` to move it to the third column instead of the second.. (i.e.: `$2=$2 FS a[$2]`) – Håkon Hægland Jan 27 '14 at 00:35