0

There is a .bed file. It has 4 columns. First contains the number of the chromosome. I need to write a bash script, to get every row which belongs to a specific chromosome, then in those cases subtract the second column from the third column (this gives the length of the gene), then I need to calculate the average length of those genes (which is on the same chromosome). And i have to do this on every chromosomes.

This code calculates the average length of the whole table, but i need to do this separately on every chromosome.

`#!/bin/bash

input_bed=${1}

awk 'BEGIN {
        FS="\t"
        sum=0
    }
    {
        sum+=$3-$2
    } END {
        print sum / NR;
    }' ${input_bed}

#Exiting
exit`
Cyrus
  • 84,225
  • 14
  • 89
  • 153
Rames
  • 13
  • 1
  • 2
    Please add sample input (no descriptions, no images, no links) and your desired output for that sample input to your question (no comment). – Cyrus Nov 19 '22 at 17:33
  • `awk '$1~/1/'` filters for `1` in the first column. Your question needs sample input / desired output for a more complete answer. – dawg Nov 19 '22 at 18:00

1 Answers1

0

You can put a predicate before the line processing block, it will then only run on input lines that satisfy the condition. Swap "1" for whatever chromosome you are investigating.


input_bed=${1}

awk 'BEGIN {
        FS="\t"
        sum=0
    }
    $1 = "1"
    {
        sum+=$3-$2
    } END {
        print sum / NR;
    }' ${input_bed}

#Exiting
exit

Alternatively, you can do it all in one run by saving the results to an associative array.


input_bed=${1}

awk 'BEGIN {
        FS="\t"
    }
    {
        sum[$1]+=$3-$2
        cnt[$1]+=1

    } END {
       for (chromosome in cnt) {
          print "Avg of Chromosome ", chromosome, " is"  sum[chromosome] / cnt[chromosome];
       }
    }' ${input_bed}

#Exiting
exit