I'm trying to understand awk and changed a script I found here: https://www.tecmint.com/learn-use-awk-special-patterns-begin-and-end/ I would like to search multiple files for multiple patterns and count them. I tested it in a folder with 3 csv. files, only the first one contains the pattern. This worked, when the pattern was directly defined.
script:
#!/bin/bash
for file in $(ls *.csv); do
if [ -f $file ] ; then
#print out filename
echo "File is: $file"
#print the total number of times phn_phnM appears in the file
awk ' BEGIN { print "The number of times phn_phnM appears in the file is:" ; }
/phn/ { counterx+=1 ; }
/phnM/ { countery+=1 ; }
END { printf "%s\n", counterx ; }
END { printf "%s\n", countery ; }
' $file
else
#print error info incase input is not a file
echo "$file is not a file, please specify a file." >&2 && exit 1
fi
done
#terminate script with exit code 0 in case of successful execution
exit 0
output:
bash ./unix_commands/count_genes.awk
File is: omics_collection.csv
The number of times phn_phnM appears in the file is:
970
84
File is: temp.csv
The number of times phn_phnM appears in the file is:
File is: temp2.csv
The number of times phn_phnM appears in the file is:
but when I tried to include variables, the script could not be executed anymore-
EDIT: as pointed out by @Charles Duffy, this was due to the problem that awk variables and bash variables are not the same, which I was completely unaware of. I adapted my script to make awk understand the variables set in the shell and now it does what I want:
#!/bin/bash
GENE1="NA"
GENE2="fadD"
for file in *.csv; do
if [ -f $file ] ; then
#print out filename
echo "File is: $file"
#print the total numbers of genes in the files
awk -v a="$GENE1" -v b="$GENE2" ' BEGIN { print "The number of times", a, "and", b " appear in the file are:" ; }
$0 ~ a { a_counter+=1 ; }
END { print a, a_counter ; }
$0 ~ b { b_counter+=1 ; }
END { print b, b_counter ; }
' $file
else
#print error info incase input is not a file
echo "$file is not a file, please specify a file." >&2 && exit 1
fi
done
#terminate script with exit code 0 in case of successful execution
exit 0
I will have to look into this "dynamic" search pattern thing, though, to understand what I actually did there. But I understood that variable expansion does not work, so /a/ as pattern was actually looking for the amount of a's in my files. I also had to replace
END { printf "%s\n", a, a_counter ; }
with
END { print, a, a_counter ; }
as the printf would only print the value of "a", but not of "a_counter" and I couldn't figure out why. I assume that "a_counter" inside awk will not be recognized as $(GENE1)_counter?