I have a bash script with two nested for loops that reads in a line from a text file and then greps the line out of a different file. The text files (${AC}.ac.txt) are all lists like:
SF=0
SF=0,1
SF=0,2
SF=1
SF=1,2
SF=2
but with varying SF= options. I need grep to pull lines with only an exact match and not similar ones (eg. SF=1 and not SF=1,2). I have tried many different grep options such as: grep "[$SF[:blank:]]", grep -P "${SF}\t", grep "$SF ", grep -P "${SF} GT"
(the grep target is always followed by GT), grep -P "${SF}\tGT"
, etc - no luck. I either get an empty file, or it doesn't filter out the other SF= options. I think the issue may be the way grep is reading in commas when it expands the bash variable? Can anyone help me with this?
The loop is as follows:
for AC in {2..5}; do
for SF in $(cat ${AC}.ac.txt); do
grep "${SF}" ${AC}_tmp1.vcf > ${AC}_tmp2.vcf
done
done
And here are a couple of example lines from the target file:
NW_024423319.1 55690 . A C 407.13 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=-2.153;ClippingRankSum=0;DP=27;ExcessHet=3.0103;FS=5.787;MQ=60;MQRankSum=0;QD=15.08;ReadPosRankSum=-0.519;SF=2 GT:GQ:PL:AD:DP .:.:.:.:. .:.:.:.:. 0/1:99:438,0,374:11,16:27
NW_024423319.1 55742 . T A 1396.9 PASS AC=3;AF=0.5;AN=4;BaseQRankSum=0.716;ClippingRankSum=0;DP=57;ExcessHet=1.549;FS=0;MQ=49.3;MQRankSum=-0.537;QD=24.51;ReadPosRankSum=0.588;SF=1,2 GT:GQ:PL:AD:DP .:.:.:.:. 0/1:99:272,0,731:20,9:29 1/1:84:1161,84,0:0,28:28
NW_024423319.1 65778 . G C 1445.14 PASS AC=4;AF=1;AN=4;DP=35;ExcessHet=0.4576;FS=0;MQ=49.22;QD=30.73;SF=1,2 GT:DP:AD:PL:GQ .:.:.:.:. 1/1:19:0,19:794,57,0:57 1/1:16:0,16:689,48,0:48
Thank you!!