0

I have two different scripts to merge files by one matching column.

file1.tsv - 4 columns separated by tab

1 LAK c.66H>T    p.Ros49Kos
2 OLD c.11A+1>R  p.Ill1639Los
3 SRP c.96V-T>X  p.Zub%D23
4 HRP c.1S>T     p.Lou33aa

file2.tsv - 14 columns, separated by tab

LAK "empty_column" c.66H>T  ......
SRP "empty_column" c.96-T>X ......

Ouptut.tsv - all columns from file2.tsv and behind 1st column of file1 if match.

LAK "empty_column" c.66H>T  ......1
SRP "empty_column" c.96-T>X ......3

I am using these two scripts, but doesn´t work:

awk -v FILE_A="file1.tsv" -v OFS="\t" 'BEGIN { while ( ( getline <
FILE_A ) > 0 ) { VAL = $0 ; sub( /^[^ ]+ /, "", VAL ) ; DICT[ $3 ] =
VAL } } { print $0, DICT[ $3 ] }' file2.tsv

or

awk 'NR==FNR{h[$3] = $1; next} {print h[$3]}' file1.tsv file2.tsv

Thanks for help.

Hai Vu
  • 37,849
  • 11
  • 66
  • 93
Vonton
  • 2,872
  • 4
  • 20
  • 27

1 Answers1

1

You might want to use the join command to join column 2 of the first file with column 1 of the second:

join --nocheck-order -1 2 -2 1 file1.tsv file2.tsv

A few notes

  • This is the first step, after this, you still have the task of cutting out unwanted columns, or rearrange them. I suggest to look into the cut command, or use awk this time.
  • The join command expects the text on both files are in the same order (alphabetical or otherwise)
  • Alternatively, import them into a temporary sqlite3 database and perform a join there.
Hai Vu
  • 37,849
  • 11
  • 66
  • 93