1

First of all, thank you for your help. I have the file letter.txt:

 A
 B
 C

And I have the file number.txt

B  10
D  20
A  15
C  18
E  23
A  12
B  14

I want to count how many times does each letter in letter.txt appears in number.txt so the output will be:

We have found 2 A
We have found 2 B
We have found 1 C
Total letter found: 5

I know I can do it using this code, but I want to do it generally with any file.

cat number.txt | awk 'BEGIN {A=0;B=0;C=0;count=0}; {count++};{if ($1 == "A")A++};{if ($1 == "B")B++};{if ($1 == "C")C++}END{print "We have found" A "A\n" "We have found" B "B\n" "We have found" C "C"}
Cyrus
  • 84,225
  • 14
  • 89
  • 153
quik1399
  • 175
  • 8

4 Answers4

2

You basically want to do an inner join (easy enough to google) and group by the join key and return the count for each group.

awk 'NR==FNR { count[$1] = 0; next }
    $1 in count { ++count[$1]; ++total}
    END { for(k in count)
        print "We have found", count[k], k
    print "Total", total, "letters"}' letters.txt numbers.txt

All of this should be easy to find in a basic Awk tutorial, but in brief, the line number within the file FNR is equal to the overall line number NR when you are reading the first input file. We initialize count to contain the keys we want to look for. If we fall through, we are reading the second file; if we see a key we want, we increase its count. When we are done, report what we found.

tripleee
  • 175,061
  • 34
  • 275
  • 318
1

Consider starting with:

$ join letter.txt <(cut -d' ' -f1 number.txt | sort) | uniq -c
      2 A
      2 B
      1 C

Then:

$ join letter.txt <(cut -d' ' -f1 number.txt | sort) | uniq -c |
    awk '
        { print "We have found", $1, $2; tot+=$1 }
        END { print "Total letter found:", tot+0 }
    '
We have found 2 A
We have found 2 B
We have found 1 C
Total letter found: 5

although in reality I'd probably just do it all in awk, just wanted to show an alternative.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Don't know if you need awk to me easier (but slower execution as you read in comments) to use grep -c

cat file1 | while read line; do 
  c=`grep -c $line file2 | sed 's/ //g'`; 
  echo We have found $c $line; 
done

it's a cycle, where $c is the count taken with grep -c, and sed remove spaces in grep -c output

  • 1
    This will be quite inefficient. Besides the [useless `cat`](https://stackoverflow.com/questions/11710552/useless-use-of-cat) you will want to fix the [quoting errors.](https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable) – tripleee May 15 '21 at 08:41
  • How you'd write it faster? , You mean quoting `, ' or "? – Daniele Rugginenti May 15 '21 at 08:46
  • You generally want double quotes around all shell variables unless you specifically require the shell to perform whitespace tokenization and wildcard expansion on the value. Click the link for details; that's why I provided it. – tripleee May 15 '21 at 08:48
  • 1
    You are running `grep` again and again on a potentially large file. The Awk solution I provided does all the searching in a single pass over the target file. This is a common FAQ. – tripleee May 15 '21 at 08:50
  • Ok that's fantastic. Great solution. Awk is the best, but i never used it too much, just the basics. Thanks for your tips. – Daniele Rugginenti May 15 '21 at 08:59
  • 1
    Quoting errors are very common, even in production scripts by ostensibly professional developers. You would do well to spend some time on understanding how it works; but if you can't or don't want to, http://shellcheck.net/ can offer simple fixes for many common shell scripting errors. – tripleee May 15 '21 at 09:10
0

grep and coreutils can also do this:

grep -f letter.txt number.txt | cut -d' ' -f1 | sort | uniq -c

Output:

      2 A
      2 B
      1 C
Thor
  • 45,082
  • 11
  • 119
  • 130
  • This finds the letters anywhere in the line, not just the first field. Fine if the other fields are all-numeric and the keys are always letters, for example; but hard to adapt if not. – tripleee May 15 '21 at 13:28
  • 1
    @tripleee: according to OP, there are only letters in the first field – Thor May 15 '21 at 13:47