I have 2 files, like these: file1.txt
chr pos ref totoal A C T G
chr1 2488104 A 8690 99.954 0.0230 0 0.0230
chr1 2488105 T 8847 0 99.954 0.022 0.0226
chr1 2488106 G 8902 0.011 0.0337 0 99.955
chr1 2488107 G 8875 0 0.02252 0 99.977
chr1 2488108 A 8674 99.919 0 0.011 0.0691
chr1 2488109 G 9116 0.021 0.0658 0 99.912
chr1 2488110 C 9191 0.087 0.0652 99.847 0
chr1 2488111 C 9291 0 0.0430 99.956 0
chr1 2488112 T 9254 0 100 0 0
chr1 2488113 C 9354 0 0.0427 99.957 0
chr1 2488114 C 9493 0 0.0842 99.915 0
and a file2.txt:
chr1 2488111 T 0.09
chr1 2488105 C 0.053
chr1 2488115 G 0.03
I would like to extract the numeric value of columns A-D of file1 if $1 and $2 matches between the two files. The choice of what column extract should depend on the $3 field of file2 (i.e. for "chr1 2488111 T" i would have back "99.956" since it is the value reported for chr1 position 2488111 in the column "T".
I found bunch of codes to match the fields between the two files, but I cannot have back just my value of interest. The AWK is the following:
awk '
BEGIN {FS = OFS = "\t"
}
NR==FNR {for (n = split ($2, a); n>0; n--) CMP[$1, a[n]]
next
}($1,$2) in CMP {print $0
}
' file2.txt file1.txt
I would like to have back a single value extracted from file1 based on the match between fields and the match between the $3 of file2 and the header of file1.
The ideal output would be to report the original line of file2 with the value appended at the end of the line:
chr1 2488111 T 0.09 99.956
chr1 2488105 C 0.053 0
chr1 2488115 G 0.03 NA