I have a tabular file with 6 columns. What I need to do, is to add a 7th column that counts the occurrence of the value from the column 3. I did it with Excel, adding the formula
=countif(C:C,$C1)
But the files are huge, and I have lots of them
For example:
My input is this one:
0 SL3.0ch03 7675648 21M GATCACTCCAAACTCATCATA NM:i:2
0 SL3.0ch03 7675648 21M GATCACTCCAAACTCATCATA NM:i:2
0 SL3.0ch03 7675648 21M GATCACTCCAAACTCATCATA NM:i:2
0 SL3.0ch03 7675649 21M ATCACTCCAAACTCATCATAC NM:i:1
0 SL3.0ch03 7675649 21M ATCACTCCAAACTCATCATAC NM:i:1
0 SL3.0ch03 7675649 21M CTCACTCCAAACTCATCATAC NM:i:2
0 SL3.0ch03 7675649 21M ATCACTCCAAACTCATCATAC NM:i:1
0 SL3.0ch03 7675649 21M ATCACTCCAAACTCATCATAC NM:i:1
0 SL3.0ch03 7675650 21M TCACTCCAAACTCATCATACT NM:i:1
0 SL3.0ch03 7675650 21M TCACTCCAAACTCATCATACT NM:i:1
0 SL3.0ch03 7675650 21M TCACTCCAAACTCATCATACT NM:i:1
0 SL3.0ch03 7675650 21M TCACTCCAAACTCATCATACT NM:i:1
And I need an output like this one:
0 SL3.0ch03 7675648 21M GATCACTCCAAACTCATCATA NM:i:2 3
0 SL3.0ch03 7675648 21M GATCACTCCAAACTCATCATA NM:i:2 3
0 SL3.0ch03 7675648 21M GATCACTCCAAACTCATCATA NM:i:2 3
0 SL3.0ch03 7675649 21M ATCACTCCAAACTCATCATAC NM:i:1 5
0 SL3.0ch03 7675649 21M ATCACTCCAAACTCATCATAC NM:i:1 5
0 SL3.0ch03 7675649 21M CTCACTCCAAACTCATCATAC NM:i:2 5
0 SL3.0ch03 7675649 21M ATCACTCCAAACTCATCATAC NM:i:1 5
0 SL3.0ch03 7675649 21M ATCACTCCAAACTCATCATAC NM:i:1 5
0 SL3.0ch03 7675650 21M TCACTCCAAACTCATCATACT NM:i:1 4
0 SL3.0ch03 7675650 21M TCACTCCAAACTCATCATACT NM:i:1 4
0 SL3.0ch03 7675650 21M TCACTCCAAACTCATCATACT NM:i:1 4
0 SL3.0ch03 7675650 21M TCACTCCAAACTCATCATACT NM:i:1 4
I've tried a few things that I found:
awk '{h[$3]++}; END { for(k in h) print k, h[k] }' input.tab
That actually displays the 7th column, but not the rest. I also found that this code:
awk '{print $1,$2,$3,$4,$5,$6}'
prints all the columns, so I thought "this should work":
awk '{print $1,$2,$3,$4,$5,$6,$7};{h[$3]++}; END { for(k in h) print k, h[k] }' input.tab > output.tab
but it obviously didn't. The best thing I could achieve was to print all 6 original columns and the output I need at the bottom of the file, but I need it as a 7th column.
I'm familiar with basic shell commands, but not with AWK language.