I have an array with few variables for a loop, I want awk
to check against one variable, and another persistent string.
my script looks like this:
wget https://ftp.ensembl.org/pub/release-109/gtf/homo_sapiens/Homo_sapiens.GRCh38.109.gtf.gz
gunzip Homo_sapiens.GRCh38.109.gtf.gz
declare -a arr=("gene" "exon" "transcript" "three_prime_utr" "five_prime_utr")
for i in "${arr[@]}"
do
echo "$i"
tail -n +6 Homo_sapiens.GRCh38.109.gtf | awk '{ if ($3==$i && $7="+") {print $0}}' > Homo_sapiens.GRCh38.109.$i.gtf
head -n5 Homo_sapiens.GRCh38.109.gtf | cat - Homo_sapiens.GRCh38.109.$i.gtf > Homo_sapiens.GRCh38.109.$i.gtf.
mv Homo_sapiens.GRCh38.109.$i.gtf. Homo_sapiens.GRCh38.109.$i.gtf
done
rm Homo_sapiens.GRCh38.109.gtf
the results is as follows:
$ wc -l *.gtf
5 Homo_sapiens.GRCh38.109.exon.gtf
5 Homo_sapiens.GRCh38.109.five_prime_utr.gtf
5 Homo_sapiens.GRCh38.109.gene.gtf
3420366 Homo_sapiens.GRCh38.109.gtf
5 Homo_sapiens.GRCh38.109.three_prime_utr.gtf
5 Homo_sapiens.GRCh38.109.transcript.gtf
Meaning I am unable to use $i
properly.
If I run one script individually e.g. using exon
tail -n +6 Homo_sapiens.GRCh38.109.gtf | awk '{ if ($3=="exon" &&$7="+") {print $0}}' > Homo_sapiens.GRCh38.109.exon.gtf
head -n5 Homo_sapiens.GRCh38.109.gtf | cat - Homo_sapiens.GRCh38.109.exon.gtf > Homo_sapiens.GRCh38.109.exon.gtf.
mv Homo_sapiens.GRCh38.109.exon.gtf. Homo_sapiens.GRCh38.109.exon.gtf
I get
1648283 Homo_sapiens.GRCh38.109.exon.gtf
5 Homo_sapiens.GRCh38.109.five_prime_utr.gtf
5 Homo_sapiens.GRCh38.109.gene.gtf
3420366 Homo_sapiens.GRCh38.109.gtf
5 Homo_sapiens.GRCh38.109.three_prime_utr.gtf
5 Homo_sapiens.GRCh38.109.transcript.gtf