I am using awk
to count the length of reads in a directory of FASTQ files. I am using the implementation suggested here. What it does is list read length and the number of occurrences.
I would like to implement this in a loop like so:
for i in $( ls ./Raw_data); do
awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}' <(gzip -dc "./Raw_data/"$i)
done
However while doing this I would like to specify which file the counts come from in a table. I would therefore like to print the name of the file with each awk
print statement.
I have tried:
awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print $i, l, lengths[l]}}' <(gzip -dc "./Raw_data/"$i)
awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print FILENAME, l, lengths[l]}}' <(gzip -dc "./Raw_data/"$i)
but these both fail. I think this is due to the piped input.
How can I achieve this?