0

I'm trying to understand awk and changed a script I found here: https://www.tecmint.com/learn-use-awk-special-patterns-begin-and-end/ I would like to search multiple files for multiple patterns and count them. I tested it in a folder with 3 csv. files, only the first one contains the pattern. This worked, when the pattern was directly defined.

script:

#!/bin/bash
for file in $(ls *.csv); do
if [ -f $file ] ; then
#print out filename
echo "File is: $file"
#print the total number of times phn_phnM appears in the file
awk ' BEGIN {  print "The number of times phn_phnM appears in the file is:" ; }
/phn/ {  counterx+=1  ;  }
/phnM/ {  countery+=1  ;  }
END {  printf "%s\n",  counterx ; }
END {  printf "%s\n",  countery ; } 
'  $file
else
#print error info incase input is not a file
echo "$file is not a file, please specify a file." >&2 && exit 1
fi
done
#terminate script with exit code 0 in case of successful execution 
exit 0

output:

bash ./unix_commands/count_genes.awk
File is: omics_collection.csv
The number of times phn_phnM appears in the file is:
970
84
File is: temp.csv
The number of times phn_phnM appears in the file is:


File is: temp2.csv
The number of times phn_phnM appears in the file is:

but when I tried to include variables, the script could not be executed anymore-

EDIT: as pointed out by @Charles Duffy, this was due to the problem that awk variables and bash variables are not the same, which I was completely unaware of. I adapted my script to make awk understand the variables set in the shell and now it does what I want:

#!/bin/bash
GENE1="NA"
GENE2="fadD"
for file in *.csv; do
if [ -f $file ] ; then
#print out filename
echo "File is: $file"
#print the total numbers of genes in the files
awk -v a="$GENE1" -v b="$GENE2" ' BEGIN {  print "The number of times", a, "and", b " appear in the file are:" ; }
$0 ~ a {  a_counter+=1  ;  }
END {  print a, a_counter ; }
$0 ~ b {  b_counter+=1  ;  }
END {  print b, b_counter ; }
'  $file
else
#print error info incase input is not a file
echo "$file is not a file, please specify a file." >&2 && exit 1
fi
done
#terminate script with exit code 0 in case of successful execution 
exit 0

I will have to look into this "dynamic" search pattern thing, though, to understand what I actually did there. But I understood that variable expansion does not work, so /a/ as pattern was actually looking for the amount of a's in my files. I also had to replace

END {  printf "%s\n", a, a_counter ; }

with

END {  print, a, a_counter ; }

as the printf would only print the value of "a", but not of "a_counter" and I couldn't figure out why. I assume that "a_counter" inside awk will not be recognized as $(GENE1)_counter?

crazysantaclaus
  • 613
  • 5
  • 19
  • 1
    aside: `for file in *.csv` is **far** less buggy than `for file in $(ls *.csv)`; see [BashPitfalls #1](http://mywiki.wooledge.org/BashPitfalls#for_i_in_.24.28ls_.2A.mp3.29). – Charles Duffy Feb 25 '18 at 19:17
  • ...but on-point: Parameter expansions (`$GENE1` and `$GENE2`) don't happen inside single quotes, and you pass the script into awk... *in single quotes*. That's a good thing, though -- if your code worked the way you currently want it to, you'd have security bugs (folks who could control the values to search for could also make `awk` run completely arbitrary code, including arbitrary shell commands). Use the `awk -v` argument as given in the answer we're flagged as duplicate of to pass shell variables as awk variables. – Charles Duffy Feb 25 '18 at 19:21
  • @CharlesDuffy: thanks for the link, I included the information and now it (almost) completely works as it is supposed to be ;-) – crazysantaclaus Feb 25 '18 at 22:15
  • 1
    If you still have an open question, could you edit this into a [mcve] that isolates it (with only the shortest code necessary to demonstrate the still-remaining problem)? This is a lot of prose to parse through right now. – Charles Duffy Feb 25 '18 at 22:19
  • @CharlesDuffy: I'll do that, thanks! – crazysantaclaus Feb 25 '18 at 22:58
  • Thank you. No need to use EDIT: markers and the like, since historical diffs are available for all to view -- the goal should be to have as readable as possible a question for people seeing it for the first time. Ping me once you're done, and I'll either re-open the question or change the duplicate list to something more appropriate in light of the edits (or someone else with a gold-badge in the tag can do so themselves, if they spot it first). – Charles Duffy Feb 25 '18 at 23:21
  • @CharlesDuffy: I opened up a new question under https://stackoverflow.com/questions/48979343/what-is-the-correct-syntax-for-awks-printf-to-insert-multiple-variables – crazysantaclaus Feb 25 '18 at 23:27

0 Answers0