How can I count the occurrences of a string within a file?

Question

Just take this code as an example. Pretending it is an HTML/text file, if I would like to know the total number of times that echo appears, how can I do it using bash?

new_user()
{
    echo "Preparing to add a new user..."
    sleep 2
    adduser     # run the adduser program
}

echo "1. Add user"
echo "2. Exit"

echo "Enter your choice: "
read choice


case $choice in
    1) new_user     # call the new_user() function
       ;;
    *) exit
       ;;
esac

score 198 · Answer 1 · edited Apr 17 '15 at 15:20

198

The number of string occurrences (not lines) can be obtained using grep with -o option and wc (word count):

$ echo "echo 1234 echo" | grep -o echo
echo
echo
$ echo "echo 1234 echo" | grep -o echo | wc -l
2

So the full solution for your problem would look like this:

$ grep -o "echo" FILE | wc -l

edited Apr 17 '15 at 15:20

Jon Schneider

25,758
23
142
170

answered Jan 24 '13 at 21:00

Dmitry

2,989
2
24
34

1

I feel it's simple solution, not sure about time complexity – kishorebjv Jan 29 '18 at 06:36
3

Be careful if grep thinks the file is "binary" you'll get a "1" output from this every time, add `-a` just to be safe if you like... – rogerdpack May 11 '20 at 16:35
1

I think this is a better answer than the accepted answer. – Frans Jul 01 '21 at 07:14
This is giving incorrect result (lesser number) for a 6GB XML file with only one line. https://stackoverflow.com/a/58994637/5524175 worked for me. This is my command: `grep -o '' file.xml | wc -l`. I'm on macOS. – duplex143 Aug 24 '23 at 09:28
Related: https://stackoverflow.com/questions/76968047/grep-command-for-counting-no-of-occurrences-of-a-string-in-a-file-giving-lesser – duplex143 Aug 24 '23 at 16:51

Manny D · Accepted Answer · 2011-07-19T03:40:38.000

121

This will output the number of lines that contain your search string.

grep -c "echo" FILE

This won't, however, count the number of occurrences in the file (ie, if you have echo multiple times on one line).

edit:

After playing around a bit, you could get the number of occurrences using this dirty little bit of code:

sed 's/echo/echo\n/g' FILE | grep -c "echo"

This basically adds a newline following every instance of echo so they're each on their own line, allowing grep to count those lines. You can refine the regex if you only want the word "echo", as opposed to "echoing", for example.

edited Jul 19 '11 at 03:40

answered Jul 19 '11 at 03:30

Manny D

20,310
2
29
31

So what can i do if there are few echo in the same line? eg. echo time echo a echo – Leo Chan Jul 19 '11 at 03:36
I've updated my response which should hopefully work for you. – Manny D Jul 19 '11 at 03:42
Thanks . Spend you a few more minute ,One more question . if i would like to delete the third occurance of echo . what can i do? – Leo Chan Jul 19 '11 at 03:43
@foodil: Remove 3rd echo : `sed -e 's/echo//3'` – Prince John Wesley Jul 19 '11 at 03:49
Can i assign the number of occurance in a variable like this?: noOfTable1=grep -c "table_1row" /var/www/html/INFOSEC/english/news/test.html thanks – Leo Chan Jul 19 '11 at 03:49
@foodil: noOfTable1=`$(grep ...)` – Prince John Wesley Jul 19 '11 at 03:55

score 3 · Answer 3 · answered Nov 22 '19 at 12:49

3

None of the existing answers worked for me with a single-line 10GB file. Grep runs out of memory even on a machine with 768 GB of RAM!

$ cat /proc/meminfo | grep MemTotal
MemTotal:       791236260 kB
$ ls -lh test.json
-rw-r--r-- 1 me all 9.2G Nov 18 15:54 test.json
$ grep -o '0,0,0,0,0,0,0,0,' test.json  | wc -l
grep: memory exhausted
0

So I wrote a very simple Rust program to do it.

Install Rust.
cargo install count_occurences

$ count_occurences '0,0,0,0,0,0,0,0,' test.json
99094198

It's a little slow (1 minute for 10GB), but at least it doesn't run out of memory!

answered Nov 22 '19 at 12:49

Timmmm

88,195
71
364
509

How do I see the source code of count_occurences? – duplex143 Aug 24 '23 at 09:12
Nvm. I figured this out - https://users.rust-lang.org/t/how-to-see-the-code-for-crate-published-in-crate-io/1897/2 – duplex143 Aug 24 '23 at 09:20
https://github.com/Timmmm/count_occurences/blob/master/src/main.rs – Timmmm Aug 24 '23 at 12:32
@Timmmm : if you're having `10GB` single line files, time to revisit and restructure the data – RARE Kpop Manifesto Aug 24 '23 at 15:12
The fact that it was on a single line is irrelevant (JSON decoders don't care). 10GB of JSON is more or less unworkable no matter how many new lines it has in it! We switched to SQLite in the end. – Timmmm Aug 24 '23 at 20:41

score 1 · Answer 4 · answered Jul 19 '11 at 03:32

1

I'm taking some guesses here, because I don't quite understand what you're asking.

I think that what you want is a count of the number of lines on which the pattern 'echo' appears in the given file.

I've pasted your sample text into a file called 6741967.

First, grep finds the matches:

james@Brindle:tmp$grep echo 6741967 
    echo "Preparing to add a new user..."
echo "1. Add user"
echo "2. Exit"
echo "Enter your choice: "

Second, use wc -l to count the lines

james@Brindle:tmp$grep echo 6741967  | wc -l
       4

answered Jul 19 '11 at 03:32

James Polley

7,977
2
29
33

thanks your help . Sorry for making you confusing. My question is to count the number of occurrences in the file – Leo Chan Jul 19 '11 at 03:35
2

If you do grep -o echo 6741967 it will output a new line for each of them, then you can use: grep -o echo 6741967 | wc -l and it will account for multiple 'echo's on a single line as well – Wivlaro Jan 03 '13 at 16:33

Ed Morton · Answer 5 · 2023-08-24T12:14:14.770

Using GNU awk for multi-char RS:

awk -v RS='echo' 'END{print NR - (NR ? 1 : 0)}' file

With the above we're counting the number of whatever...echo "records" in the input. The - (NR ? 1 : 0) is so we don't count the string after the last echo in the input (input foo...echo...bar should report 1, not 2) and so we print 0 instead of -1 for an empty input file.

Since the above is reading each echo-separated string one at a time it will handle very large files containing multiple echos better than grep -o echo which apparently tries to read the whole input into memory at once and then split it up.

score -2 · Answer 6 · answered Jun 26 '18 at 07:25

-2

if you just want the number of occurences then you can do this, $ grep -c "string_to_count" file_name

answered Jun 26 '18 at 07:25

beginner

15
1

5

Won't count the string happening twice on the same line correctly. – Josiah Nov 20 '18 at 15:29

How can I count the occurrences of a string within a file?

6 Answers6

Linked

Related