0

I have hundreds of tsv file with following structure (example):

GH1 123 family1
GH2 23 family2
.
.
.
GH4 45 family4
GH6 34 family6

And i have a text file with list of words (thousands):

GH1
GH2
GH3
.
.
.
GH1000

I want to get output which contain number of each words occurred in each file like this

 GH1 GH2 GH3 ... GH1000
filename1 1 1 0... 4
.
.
.
filename2 2 3 1... 0

I try this code but it gives me zero only

for file in *.tsv; do
    echo $file >> output.tsv
    cat fore.txt | while read line; do
        awk -F "\\t" '{print $1}' $file | grep -wc $line >>output.tsv
        echo "\\t">>output.tsv;
    done ;
done
fedorqui
  • 275,237
  • 103
  • 548
  • 598

2 Answers2

0

Use the following script.

Just put sdtout to output.txt file.

#!/bin/bash

while read p; do
    echo -n "$p "
done <words.txt

echo ""
for file in *.tsv; do
    echo -n "$file = "
    while read p; do
        COUNT=$(sed 's/$p/$p\n/g' $file | grep -c "$p")
        echo -n "$COUNT     "   
    done <words.txt
    echo ""
done


0

Here is a simple Awk script which collects a list like the one you describe.

awk 'BEGIN { printf "\t" }
    NR==FNR { a[$1] = n = FNR;
        printf "\t%s", $1; next }
    FNR==1 {
        if(f) { printf "%s", f;
            for (i=1; i<=n; i++)
                printf "\t%s", 0+b[i] }
        printf "\n"
        delete b
        f = FILENAME }
    $1 in a { b[$1]++ }' fore.txt *.tsv /etc/motd

To avoid repeating the big block in END, we add a short sentinel file at the end whose only purpose is to supply a file after the last whose counts will not be reported.

The shell's while read loop is slow and inefficient and somewhat error-prone (you basically always want read -r and handling incomplete text files is hairy); in addition, the brute-force method will require reading the word file once per iteration, which incurs a heavy I/O penalty.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Hi, i tried this code, it gives the result in tabular format but count result are zero for all. – Hitesh Tikariha Dec 28 '19 at 03:28
  • Does your input file have DOS carriage returns? Take them out and try again. See also https://stackoverflow.com/questions/39527571/are-shell-scripts-sensitive-to-encoding-and-line-endings – tripleee Dec 28 '19 at 08:15