4

I am trying to count the occurrences of a word in a file.

If word occurs multiple times in a line, I will count is a 1.

Following command will give me the output but will fail if line has multiple occurrences of word

grep -c "word" filename.txt

Is there any one liner?

benka
  • 4,732
  • 35
  • 47
  • 58
hardy_sandy
  • 361
  • 4
  • 6
  • 13
  • possible duplicate [Calculate Word occurrences from file in bash](http://stackoverflow.com/questions/11850823/calculate-word-occurrences-from-file-in-bash) – jbh Feb 06 '14 at 12:55
  • Does "I will count is a 1." mean "I will count it as 1" or "I will count each as 1" ? – Guntram Blohm Feb 06 '14 at 12:58

5 Answers5

20

You can use grep -o to show the exact matches and then count them:

grep -o "word" filename.txt | wc -l

Test

$ cat a
hello hello how are you
hello i am fine
but
this is another hello

$ grep -c "hello" a    # Normal `grep -c` fails
3

$ grep -o "hello" a 
hello
hello
hello
hello
$ grep -o "hello" a | wc -l   # grep -o solves it!
4
fedorqui
  • 275,237
  • 103
  • 548
  • 598
3

Set RS in awk for a shorter one.

awk 'END{print NR-1}' RS="word" file
BMW
  • 42,880
  • 12
  • 99
  • 116
2

GNU awk allows it to be done in single command with use of multiple piped commands:

awk -v w="word" '$1==w{n++} END{print n}' RS=' |\n' file
anubhava
  • 761,203
  • 64
  • 569
  • 643
1
cat file | cut -d ' ' | grep -c word

This assumes that all words in the file have spaces between the words. If there's punctuation concatenating the word to itself, or otherwise no spaces on a single line between the word and itself, they'll count as one.

atk
  • 9,244
  • 3
  • 32
  • 32
  • how about `tr " " "\n"< file |grep -c "word"` – BMW Feb 07 '14 at 03:54
  • I think `grep -o '[^ \t\n,.]\+'`would let you specify word separators, then use `wc -l` – coya Apr 22 '16 at 16:08
  • Sorry, missed the -P option in the regexp. See: http://stackoverflow.com/questions/1825552/grep-a-tab-in-unix for more info – coya Apr 22 '16 at 16:27
-1
grep word filename.txt | wc -l

grep prints the lines that match, then wc -l prints the number of lines matched

Michael
  • 979
  • 6
  • 13
  • 2
    It does not count reoccurrences of words in the same line. This counts how many lines have the word in them – jbh Feb 06 '14 at 12:57
  • 1
    @GuntramBlohm No it does not. Given my sample file, it would return 3 instead of 4. – fedorqui Feb 06 '14 at 12:57
  • "I will count is a 1." would mean, to me, he wants multiple words on the same line count only once. – Guntram Blohm Feb 06 '14 at 12:59
  • 1
    However, read the "Following command will give me the output but will fail if line has multiple occurrences of word." I think he probably meant to say "If a word occurs multiple times in a line, it will count it as 1" – jbh Feb 06 '14 at 13:01
  • 1
    yes, he meant that "up to now, if multiple occurence on one line it counts it as one" and therefor he is looking for a better solution (one that counts occurence of the word, not of lines containing the word) (hence the question. Otherwise, his "grep -c" would already be the answer). – Olivier Dulac Feb 06 '14 at 13:19