169

I need to count the number of occurrences of a char in a string using Bash.

In the following example, when the char is (for example) t, it echos the correct number of occurrences of t in var, but when the character is comma or semicolon, it prints out zero:

var = "text,text,text,text" 
num = `expr match $var [,]`
echo "$num"
nbro
  • 15,395
  • 32
  • 113
  • 196
Jericob
  • 1,947
  • 2
  • 14
  • 16

10 Answers10

170

you can for example remove all other chars and count the whats remains, like:

var="text,text,text,text"
res="${var//[^,]}"
echo "$res"
echo "${#res}"

will print

,,,
3

or

tr -dc ',' <<<"$var" | awk '{ print length; }'

or

tr -dc ',' <<<"$var" | wc -c    #works, but i don't like wc.. ;)

or

awk -F, '{print NF-1}' <<<"$var"

or

grep -o ',' <<<"$var" | grep -c .

or

perl -nle 'print s/,//g' <<<"$var"
clt60
  • 62,119
  • 17
  • 107
  • 194
  • 1
    [some more trick here](http://www.cyberciti.biz/faq/unix-linux-appleosx-bsd-bash-count-characters-variables/) like `y="${x//[^s|S]}"; echo "${#y}"` – Aquarius Power Jun 13 '14 at 00:11
  • 7
    use the first one, should always avoid resorting to spawning another process to do work like this, it can severely impact performance when using with large iteration loops. As a rule external process execution should be a last resort when using iterating or repeating operations. – osirisgothra Jun 21 '14 at 10:17
  • 1
    @CiroSantilli六四事件法轮功包卓轩because for example `echo -n some line | wc -l` – clt60 Nov 20 '15 at 02:04
  • Code block 4 is the best in my opinion. We need to make it easier to get to: `tr -dc ',' <<<"$var" | wc -c` – bgStack15 Feb 26 '16 at 14:57
  • 2
    @bgStack15 code block 1 does not initial additional process, and maybe has better performance if you have huge number of line to parsing. – petertc Apr 01 '16 at 06:01
  • Just FYI, `tr -dc ',' <<<"$var" | awk '{ print length; }'` seems to be way faster than the first option. – Bhushan Jul 20 '16 at 06:21
  • Note that `sed 's/[^,]//g'` is much faster then using `${var//[^,]}`. So for example checking number of options would be much faster this way: `echo $allDomains | sed 's/[^-]//g' | wc -c` – Nux Jul 10 '18 at 14:10
  • after taking [benchmark](https://transang.me/performance-benchmark-count-number-of-occurrences-of-a-character/), I got the methods list after sorted from fastest to slowest: *variable expansion* > *tr* > *grep* > *awk* > *perl* – Sang Nov 12 '19 at 06:15
  • @transang your "benchmark" benchmarking nothing. Reading a text file line by line and executing the commands zilion times, and summing the counts in shell is an example of the worst programming practices. Of course, the results are absolutely pointless. If want count commas in some file, you should NOT to do it line-by-line. – clt60 Nov 20 '19 at 12:03
  • @jm666 you are correct on advising be not to read file line by line. I also figured out that the reason of my code slow speed is the line-by-line-reading behavior. However, in my benchmark, I also do the only-read experiment, it shows the reading time. You can subtract this reading time from all others to get processing time – Sang Nov 21 '19 at 04:23
  • @transang Need subtracting not only the reading time, but (and most important) the command start time too. (aka: fork-exec). For example. `perl` start time is much bigger as for example the `tr`. Therefore you got the result as the perl is worst. But if you using the correct approach, you need start perl only **once** and the results will be very different. The same applies for example for `awk` too. Be fair, and just tell in your web-page the truth. Or develop a code, where the WHOLE FILE is processed by one fork-exec cycle and you can safely publishing the results. Not in the current case. – clt60 Nov 22 '19 at 05:04
  • you are right. not only file open/close but the startup time of the program is also an important factor. at least, when startup time is similar to processing time, my benchmark result is useful. Let me point it out in my post – Sang Nov 22 '19 at 06:22
  • this solution is **very** slow if you the string you are processing is very large. – Matthew Snyder Feb 26 '21 at 16:22
  • @MatthewSnyder Which one do you mean by "this" of the listed 7 different methods? And how large is the "very large"? – clt60 Mar 01 '21 at 13:29
  • Why do you not like wc? – Robert Dec 29 '21 at 11:22
  • 1
    @Robert becase the `wc` incorrectly counts the number of lines, in the input. Of course, the `wc -c` is OK but because of the line counting problem I don't like it. example, the: `printf "line1\nline2\n" | wc -l` prints `2` but the `printf "line1\nline2" | wc -l` prints only `1`. – clt60 Jan 01 '22 at 09:10
128

I would use the following awk command:

string="text,text,text,text"
char=","
awk -F"${char}" '{print NF-1}' <<< "${string}"

I'm splitting the string by $char and print the number of resulting fields minus 1.

If your shell does not support the <<< operator, use echo:

echo "${string}" | awk -F"${char}" '{print NF-1}'
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
110

You can do it by combining tr and wc commands. For example, to count e in the string referee

echo "referee" | tr -cd 'e' | wc -c

output

4

Explanations: Command tr -cd 'e' removes all characters other than 'e', and Command wc -c counts the remaining characters.

Multiple lines of input are also good for this solution, like command cat mytext.txt | tr -cd 'e' | wc -c can counts e in the file mytext.txt, even thought the file may contain many lines.

*** Update ***

To solve the multiple spaces in from of the number (@tom10271), simply append a piped tr command:

 tr -d ' '

For example:

echo "referee" | tr -cd 'e' | wc -c | tr -d ' '
Robin Hsu
  • 4,164
  • 3
  • 20
  • 37
  • 1
    In macOS, output contains mulitple spaces in front of the number – tom10271 Sep 25 '20 at 02:06
  • Great, many thanks for the `tr`, it really sounds to be a great tool i didn't know about. Today i learned something new! – j.c Jun 08 '22 at 09:54
11

awk is very cool, but why not keep it simple?

num=$(echo $var | grep -o "," | wc -l)
Matthew Snyder
  • 383
  • 2
  • 11
7

Building on everyone's great answers and comments, this is the shortest and sweetest version:

grep -o "$needle" <<< "$haystack" | wc -l

rmanna
  • 1,133
  • 14
  • 14
3

also check this out, for example we wanna count t

echo "test" | awk -v RS='t' 'END{print NR-1}'

or in python

python -c 'print "this is for test".count("t")'

or even better, we can make our script dynamic with awk

echo 'test' | awk '{for (i=1 ; i<=NF ; i++) array[$i]++ } END{ for (char in array) print char,array[char]}' FS=""

in this case output is like this :

e 1
s 1
t 2
Freeman
  • 9,464
  • 7
  • 35
  • 58
2

awk works well if you your server has it

var="text,text,text,text" 
num=$(echo "${var}" | awk -F, '{print NF-1}')
echo "${num}"
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
1

I Would suggest the following:

var="any given string"
N=${#var}
G=${var//g/}
G=${#G}
(( G = N - G ))
echo "$G"

No call to any other program

Ali
  • 2,702
  • 3
  • 32
  • 54
Mathew P V
  • 11
  • 1
0

The awk solutions provided here so far all break if there's a line break in your text. E.g.:

text="one,two,thr
ee,four"
DELIM=','
count=$( awk -F"$DELIM" '{print NF-1}' <<<"${text}" )
echo $count

Result:

2
1

The solution that will also work correctly with line breaks is:

text="one,two,thr
ee,four"
DELIM=','
count=$( awk 'BEGIN{RS="'"$DELIM"'";FS=""}END{print NR-1}' <<<"${text}" )
echo $count

Result is 3.

Mecki
  • 125,244
  • 33
  • 244
  • 253
0

Count fixed strings (-F) from a file

export searchpattern=$(echo ",")

echo "text,text,text,text" | tr "," '\n' | sed 's/$/,/g' > filename

export count=$(grep -F $searchpattern filename | wc -l)

echo "$count-1" | bc