0

I'm a bioinformatics student and new to bash and programming stuff. I want to calculate genome coverage.

This is my script. I switch the real parameter with xx but I'm sure xxs are not problematic. Other students already execute this script with no error.

filename=$1
reference=/xx
filebase=$(basename $filename .bam)

samtools view ${filename} -F 4 -q 30 -b > ${filebase}.f.bam

genomeCoverageBed -ibam -g ${filebase}.f.bam  ${reference} > /mnt/ABC/projects/abc/def/${filebase}.cov

coverage=$(grep genome /mnt/ABC/projects/abc/def/${filebase}.cov | awk '{NUM+=$2*$3; DEN+=$3} END {print NUM/DEN}')

echo -e "${filename},${coverage}" >> coverages.txt

When I execute this script with

sh ./coverage.sh /mnt/XYZ/share/sdf_rawdata/hsa/mergedbams/ghj_merged_200203.hs37d5.cons.90perc.bam

it doesn't work and gives me: awk: cmd. line:1: fatal: division by zero attempted and unrecognized parameter:/mnt/XYZ/share/sdf_rawdata/hsa/mergedbams/ghj_merged_200203.hs37d5.cons.90perc.bam error and in the coverages.txt file it only has this line:

-e /mnt/NEOGENE2/share/compevo_rawdata/hsa/mergedbams/Ash128_all.merged.hs37d5.fa.cons.90perc.bam, nothing more.

Thanks for helping.

AJM
  • 1,317
  • 2
  • 15
  • 30
azlilili
  • 25
  • 3
  • It means `DEN` equals zero. You did not show the content of the offending line in `$filebase.cov`, so we can't know the 3rd field and hence don't know how `DEN` has been calculated – user1934428 Nov 16 '20 at 07:05
  • As an aside, your script will fail if the user passes in a filename with spaces or other shell metacharacters in it. See [When to wrap quotes around a shell variable](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) (TLDR: basically always). – tripleee Nov 16 '20 at 08:41

1 Answers1

3

You need to put a condition to check if DEN variable is NOT NULL then only do the division in END block of awk code(trying to fix OP's attempt here).

coverage=$(grep genome /mnt/ABC/projects/abc/def/${filebase}.cov | awk '{NUM+=$2*$3; DEN+=$3} END {if(DEN){print NUM/DEN}}')

You need not to use grep command along with awk, we could search string in awk itself, may be something like:

coverage=$(awk '/genome/{NUM+=$2*$3; DEN+=$3} END {if(DEN){print NUM/DEN}}' "/mnt/ABC/projects/abc/def/${filebase}.cov")



Why error is coming: Because its sometimes your variable DEN is having zero values in it. Let's take an example here in shorter way(just some examples to understand the error here):

When variable a is NULL then also we get same error.

awk 'BEGIN{a="";b=1;print b/a}'
awk: cmd. line:1: fatal: division by zero attempted

When variable a is zero then also we get same error:

awk 'BEGIN{a=0;b=1;print b/a}'
awk: cmd. line:1: fatal: division by zero attempted
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Notice also that anything which looks like `grep 'x' | awk '{ y }'` can be usefully refactored to `awk '/x/ { y }'`; see also [useless use of `grep`.](http://www.iki.fi/era/unix/award.html#grep) – tripleee Nov 16 '20 at 08:39
  • @tripleee, Thanks, as I mentioned I was fixing OP's attempt, let me write a note for this one too. – RavinderSingh13 Nov 16 '20 at 08:40
  • 1
    Programs should not silently fail on bad input. Throwing the divide by zero error is clear. – stark Nov 16 '20 at 12:07