Count occurrences of a char in a string using Bash

Question

I need to count the number of occurrences of a char in a string using Bash.

In the following example, when the char is (for example) t, it echos the correct number of occurrences of t in var, but when the character is comma or semicolon, it prints out zero:

var = "text,text,text,text" 
num = `expr match $var [,]`
echo "$num"

http://unix.stackexchange.com/questions/18736/how-to-count-the-number-of-a-specific-character-in-each-line — Ciro Santilli OurBigBook.com, Nov 19 '15 at 23:16

clt60 · Answer 1 · 2013-05-21T21:27:06.100

170

you can for example remove all other chars and count the whats remains, like:

var="text,text,text,text"
res="${var//[^,]}"
echo "$res"
echo "${#res}"

will print

,,,
3

or

tr -dc ',' <<<"$var" | awk '{ print length; }'

or

tr -dc ',' <<<"$var" | wc -c    #works, but i don't like wc.. ;)

or

awk -F, '{print NF-1}' <<<"$var"

or

grep -o ',' <<<"$var" | grep -c .

or

perl -nle 'print s/,//g' <<<"$var"

edited May 21 '13 at 21:27

answered May 21 '13 at 21:19

clt60

62,119
17
107
194

1

[some more trick here](http://www.cyberciti.biz/faq/unix-linux-appleosx-bsd-bash-count-characters-variables/) like `y="${x//[^s|S]}"; echo "${#y}"` – Aquarius Power Jun 13 '14 at 00:11
7

use the first one, should always avoid resorting to spawning another process to do work like this, it can severely impact performance when using with large iteration loops. As a rule external process execution should be a last resort when using iterating or repeating operations. – osirisgothra Jun 21 '14 at 10:17
1

@CiroSantilli六四事件法轮功包卓轩because for example `echo -n some line | wc -l` – clt60 Nov 20 '15 at 02:04
Code block 4 is the best in my opinion. We need to make it easier to get to: `tr -dc ',' <<<"$var" | wc -c` – bgStack15 Feb 26 '16 at 14:57
2

@bgStack15 code block 1 does not initial additional process, and maybe has better performance if you have huge number of line to parsing. – petertc Apr 01 '16 at 06:01
Just FYI, `tr -dc ',' <<<"$var" | awk '{ print length; }'` seems to be way faster than the first option. – Bhushan Jul 20 '16 at 06:21
Note that `sed 's/[^,]//g'` is much faster then using `${var//[^,]}`. So for example checking number of options would be much faster this way: `echo $allDomains | sed 's/[^-]//g' | wc -c` – Nux Jul 10 '18 at 14:10
after taking [benchmark](https://transang.me/performance-benchmark-count-number-of-occurrences-of-a-character/), I got the methods list after sorted from fastest to slowest: *variable expansion* > *tr* > *grep* > *awk* > *perl* – Sang Nov 12 '19 at 06:15
@transang your "benchmark" benchmarking nothing. Reading a text file line by line and executing the commands zilion times, and summing the counts in shell is an example of the worst programming practices. Of course, the results are absolutely pointless. If want count commas in some file, you should NOT to do it line-by-line. – clt60 Nov 20 '19 at 12:03
@jm666 you are correct on advising be not to read file line by line. I also figured out that the reason of my code slow speed is the line-by-line-reading behavior. However, in my benchmark, I also do the only-read experiment, it shows the reading time. You can subtract this reading time from all others to get processing time – Sang Nov 21 '19 at 04:23
@transang Need subtracting not only the reading time, but (and most important) the command start time too. (aka: fork-exec). For example. `perl` start time is much bigger as for example the `tr`. Therefore you got the result as the perl is worst. But if you using the correct approach, you need start perl only **once** and the results will be very different. The same applies for example for `awk` too. Be fair, and just tell in your web-page the truth. Or develop a code, where the WHOLE FILE is processed by one fork-exec cycle and you can safely publishing the results. Not in the current case. – clt60 Nov 22 '19 at 05:04
you are right. not only file open/close but the startup time of the program is also an important factor. at least, when startup time is similar to processing time, my benchmark result is useful. Let me point it out in my post – Sang Nov 22 '19 at 06:22
this solution is **very** slow if you the string you are processing is very large. – Matthew Snyder Feb 26 '21 at 16:22
@MatthewSnyder Which one do you mean by "this" of the listed 7 different methods? And how large is the "very large"? – clt60 Mar 01 '21 at 13:29
Why do you not like wc? – Robert Dec 29 '21 at 11:22
1

@Robert becase the `wc` incorrectly counts the number of lines, in the input. Of course, the `wc -c` is OK but because of the line counting problem I don't like it. example, the: `printf "line1\nline2\n" | wc -l` prints `2` but the `printf "line1\nline2" | wc -l` prints only `1`. – clt60 Jan 01 '22 at 09:10

hek2mgl · Accepted Answer · 2017-10-25T19:23:42.490

128

I would use the following awk command:

string="text,text,text,text"
char=","
awk -F"${char}" '{print NF-1}' <<< "${string}"

I'm splitting the string by $char and print the number of resulting fields minus 1.

If your shell does not support the <<< operator, use echo:

echo "${string}" | awk -F"${char}" '{print NF-1}'

edited Oct 25 '17 at 19:23

answered May 21 '13 at 21:05

hek2mgl

152,036
28
249
266

what if var is a file with text. how can the above be altered for this? – HattrickNZ Sep 11 '14 at 03:17
5

@HattrickNZ Then use: `$(grep -o "$needle" < filename | wc -l)` – hek2mgl Sep 11 '14 at 08:26
@hek2mgi. but this one outputs 3 ?! when I am echo "number_of_occurences" – Amir Sep 11 '14 at 16:05
13

@Amir What do you expect? – hek2mgl Sep 12 '14 at 09:02
3

You can skip the `wc -l`, just use `grep -c`, it works on both bsd grep and linux grep. – andsens Aug 05 '16 at 11:54
8

@andsens `grep -c` will only output the number of matching lines. It does not count multiple matches per line. – hek2mgl Aug 05 '16 at 12:19
1

I want to count '$'s in a string, how can I escape '$' from main string? – masT Feb 06 '18 at 12:22
this would mis-count when passing through a blank line embedded within the larger input file (shall that be the case). A blank line yields NF = 0, so it'll end up printing a negative 1 instead . – RARE Kpop Manifesto Feb 26 '21 at 16:54
1

The solution works as long as there are no line breaks in the text – Mecki Feb 06 '22 at 02:32

Robin Hsu · Answer 3 · 2020-11-03T08:56:34.180

110

You can do it by combining tr and wc commands. For example, to count e in the string referee

echo "referee" | tr -cd 'e' | wc -c

output

Explanations: Command tr -cd 'e' removes all characters other than 'e', and Command wc -c counts the remaining characters.

Multiple lines of input are also good for this solution, like command cat mytext.txt | tr -cd 'e' | wc -c can counts e in the file mytext.txt, even thought the file may contain many lines.

*** Update ***

To solve the multiple spaces in from of the number (@tom10271), simply append a piped tr command:

 tr -d ' '

For example:

echo "referee" | tr -cd 'e' | wc -c | tr -d ' '

edited Nov 03 '20 at 08:56

answered Dec 13 '16 at 10:49

Robin Hsu

4,164
3
20
37

1

In macOS, output contains mulitple spaces in front of the number – tom10271 Sep 25 '20 at 02:06
Great, many thanks for the `tr`, it really sounds to be a great tool i didn't know about. Today i learned something new! – j.c Jun 08 '22 at 09:54

score 11 · Answer 4 · answered Feb 26 '21 at 16:21

11

awk is very cool, but why not keep it simple?

num=$(echo $var | grep -o "," | wc -l)

answered Feb 26 '21 at 16:21

Matthew Snyder

383
2
11

score 7 · Answer 5 · answered May 13 '19 at 10:10

7

Building on everyone's great answers and comments, this is the shortest and sweetest version:

grep -o "$needle" <<< "$haystack" | wc -l

answered May 13 '19 at 10:10

rmanna

1,133
14
14

score 3 · Answer 6 · answered Feb 19 '20 at 17:34

also check this out, for example we wanna count t

echo "test" | awk -v RS='t' 'END{print NR-1}'

or in python

python -c 'print "this is for test".count("t")'

or even better, we can make our script dynamic with awk

echo 'test' | awk '{for (i=1 ; i<=NF ; i++) array[$i]++ } END{ for (char in array) print char,array[char]}' FS=""

in this case output is like this :

e 1
s 1
t 2

score 2 · Answer 7 · edited Oct 21 '17 at 20:14

2

awk works well if you your server has it

var="text,text,text,text" 
num=$(echo "${var}" | awk -F, '{print NF-1}')
echo "${num}"

edited Oct 21 '17 at 20:14

hek2mgl

152,036
28
249
266

answered Jun 02 '14 at 17:15

user2508516

33
4

Just as a note: `awk -F,` looks for a `,`. You can do the following: `awk -F"${your_char}"` – Emixam23 Mar 19 '19 at 14:49

score 1 · Answer 8 · edited Jan 24 '19 at 05:24

1

I Would suggest the following:

var="any given string"
N=${#var}
G=${var//g/}
G=${#G}
(( G = N - G ))
echo "$G"

No call to any other program

edited Jan 24 '19 at 05:24

Ali

2,702
3
32
54

answered Jan 24 '19 at 04:00

Mathew P V

11
1

Mecki · Answer 9 · 2022-02-06T03:15:14.997

The awk solutions provided here so far all break if there's a line break in your text. E.g.:

text="one,two,thr
ee,four"
DELIM=','
count=$( awk -F"$DELIM" '{print NF-1}' <<<"${text}" )
echo $count

Result:

2
1

The solution that will also work correctly with line breaks is:

text="one,two,thr
ee,four"
DELIM=','
count=$( awk 'BEGIN{RS="'"$DELIM"'";FS=""}END{print NR-1}' <<<"${text}" )
echo $count

Result is 3.

score 0 · Answer 10 · answered Jun 06 '23 at 12:17

0

Count fixed strings (-F) from a file

export searchpattern=$(echo ",")

echo "text,text,text,text" | tr "," '\n' | sed 's/$/,/g' > filename

export count=$(grep -F $searchpattern filename | wc -l)

echo "$count-1" | bc

answered Jun 06 '23 at 12:17

Jamilah Foucher

11
2

Count occurrences of a char in a string using Bash

10 Answers10

Count fixed strings (-F) from a file

Linked

Related