I am having a problem while accessing the columns of a file in awk. I have two files, one has 12 columns and the other has 5 columns.
1.txt
chr1 10 20 . . + chr1 30 40 ABC . +
chr2 11 22 . . + chr2 90 92 XXX . -
chrX 33 42 . . + chrX 70 80 XXX . +
chr4 3 12 . . + chr4 70 80 ZZZ . +
And,
2.txt
1 chr1 30 40 ABC
3 chr1 35 40 ABC
27 chr2 90 92 XXX
1 chrX 70 80 XXX
2 chrY 12 13 XXX
I want to compare the 2nd,3rd,4th and 5th column of 2.txt
with 7th,8th,9th,10th of 1.txt.
If there is a match, it should print the whole line of 1.txt
, and the 1st column of 2.txt.
Expected output:
chr1 10 20 . . + chr1 30 40 ABC . + 1
chr2 11 22 . . + chr2 90 92 XXX . - 27
chrX 33 42 . . + chrX 70 80 XXX . + 1
As I could not compare the 4 columns, I did it with two. And, I am able to compare the two columns of each (2nd and 3rd of 2.txt
and 7th and 8th of 1.txt
), and I can print a string if there is a match. But I cannot print the first column of first file.
My code:
awk -F, 'NR==FNR {a[$2 FS $3];next} {print $0 FS (($7 FS $8) in a?"exists":"none")}' 2.txt 1.txt
What it makes (which I don't want):
chr1 10 20 . . + chr1 30 40 ABC . + exists
chr2 11 22 . . + chr2 90 92 XXX . - exists
chrX 33 42 . . + chrX 70 80 XXX . + exists
chr4 3 12 . . + chr4 70 80 ZZZ . + none
How can I change this new 13th column to the corresponding 1st column of 1.txt?