I have two files wherein file1 spans around 6000 lns containing Id matching value and file2 1000lns containing ID. I want to retrieve value from file1 for only those IDs in file2.
file1.tsv
:
RtGene0001 rt_000005300
RtGene0002 rt_000005400
RtGene0003 rt_000005500
RtGene0004 rt_000005600
RtGene0005 rt_000005700
RtGene0006 rt_000005800
RtGene0007 rt_000005900
RtGene0008 rt_000006000
RtGene0008 rt_000006100
RtGene0009 rt_000006200
file2.tsv
:
RtGene0006
RtGene0007
RtGene0013
RtGene0014
Expected output:
RtGene0006 rt_000005800
RtGene0007 rt_000005900
I have tried few awk scripts but they keep showing in NA in the value column.
scripts:
awk 'BEGIN {FS=OFS="\t"} FNR==NR {a[$1]=$2; next} {print $1, (($1 in a)?a[$1]:"NAN")}' file1.tsv file2.tsv
Output I got:
RtGene00NAN
RtGene00NAN
RtGene00NAN
RtGene00NAN
RtGene00NAN
RtGene00NAN
RtGene00NAN
RtGene00NAN
RtGene00NAN
RtGene00NAN
awk -f filename1.awk file1.tsv file2.tsv
filename1.awk
BEGIN{FS=OFS="\t"}
FNR==NR{
a[$1]=$2
next
}
{ if ($1 in a) {print $1, a[$1]} else {print $1,"\t", "#NA"} }
Output I got:
RtGene0006 #NA
RtGene0007 #NA
Kindly help me address what I could be missing and give suggestions on how to work around the issue.