Counting regex pattern matches in one line using sed or grep?

Question

I want to count the number of matches there is on one single line (or all lines as there always will be only one line).

I want to count not just one match per line as in

echo "123 123 123" | grep -c -E "123" # Result: 1

Better example:

echo "1 1 2 2 2 5" | grep -c -E '([^ ])( \1){1}' # Result: 1, expected: 2 or 3

There will always only be "one data" because maybe i want to match `123 123` 3 (or 2) times in `123 123 123 123` — Tyilo, May 30 '11 at 22:55

score 53 · Accepted Answer · answered May 30 '11 at 22:49

53

You could use grep -o then pipe through wc -l:

$ echo "123 123 123" | grep -o 123 | wc -l
3

answered May 30 '11 at 22:49

Simon Whitaker

20,506
4
62
79

1

My version of grep doesn't know what `-o` is :( – manojlds May 30 '11 at 22:52
15

You need to ask Father Christmas for a new grep this year. :) – Simon Whitaker May 30 '11 at 22:54
@manojlds, do you have `egrep`? Same thing would work w/ `egrep` – Mike Pennington May 30 '11 at 22:54
@Mike Pennington - thanks, `egrep` says the same. I am on Windows now, so i think it's expected. – manojlds May 30 '11 at 22:57
@Tylio - that's not surprising, look at your regex. It's asking for 0 or more instances of anything other than a space, followed by a space, followed by the first thing you matched again. Note: **0 or more**. There are indeed five such matches in your string (assuming you don't rewind after each match). They are: 1) `"1 1"` (bytes 1-3), 2) `" "` (i.e. zero instances of something that isn't a space, followed by a space, followed by the same zero instances again - byte 4), 3) `"2 2"` (bytes 5-7), 4) `" "` (byte 8) and finally 5) `" "` (byte 10). Phew! – Simon Whitaker May 30 '11 at 23:00
(Run it without the pipe to `wc -l` at the end and you'll see them.) – Simon Whitaker May 30 '11 at 23:03
What still results in 2? Your grep -E example, or my answer to your question? (I get 5 from the former, 3 from the latter.) – Simon Whitaker May 30 '11 at 23:06
This: `echo "1 1 2 2 2 5" | grep -o -E '([^ ])( \1){1}' | wc -l` – Tyilo May 30 '11 at 23:08
As far as I know, `grep -o` won't rewind on finding a match, so this will only match "1 1" (bytes 1-3) and "2 2" (bytes 5-7). It won't match bytes 7-9 ("2 2") because by the time it comes to consider bytes 8 onwards it's already consumed bytes 1-7 in the previous two matches. – Simon Whitaker May 30 '11 at 23:11
Why does echo "1 1 2 2 2 5" | grep -o 2 | wc -l which gives 3 not meet your requirement? – grok12 May 31 '11 at 01:15

score 1 · Answer 2 · answered May 30 '11 at 22:46

1

Maybe below:

echo "123 123 123" | sed "s/123 /123\n/g" | wc -l

( maybe ugly, but my bash fu is not that great )

answered May 30 '11 at 22:46

manojlds

290,304
63
469
417

@Tyilo - what did you try? I am getting 3 for the above input – manojlds May 30 '11 at 22:58
Copy and pasted your code, but i remember now that my sed doesn't support `\n` – Tyilo May 30 '11 at 22:59

score 1 · Answer 3 · answered May 31 '11 at 12:53

1

Maybe you should convert spaces to newlines first:

$ echo "1 1 2 2 2 5" | tr ' ' $'\n' | grep -c 2
3

answered May 31 '11 at 12:53

glenn jackman

238,783
38
220
352

jarno · Answer 4 · 2015-09-04T16:57:05.213

0

Why not use awk? You could use awk '{print gsub(your_regex,"&")}' to print the number of matches on each line, or awk '{c+=gsub(your_regex,"&")}END{print c}' to print the total number of matches. Note that relative speed may vary depending on which awk implementation is used, and which input is given.

edited Sep 04 '15 at 16:57

answered Sep 03 '15 at 15:38

jarno

787
10
21

Another way by gawk is `gawk -v FPAT=your_regex '{print NF}'` or `gawk -v FPAT=your_regex '{c+=NF}END{print c}'`, respectively. – jarno Sep 03 '15 at 17:57

potong · Answer 5 · 2015-09-05T06:05:03.947

0

This might work for you:

sed -n -e ':a' -e 's/123//p' -e 'ta' file | sed -n '$='

GNU sed could be written:

sed -n ':;s/123//p;t' file | sed -n '$='

edited Sep 05 '15 at 06:05

answered Sep 04 '15 at 10:21

potong

55,640
6
51
83

The first script does't work by GNU sed 4.2.2: "sed: can't find label for jump to `a'". It seems to work better, if you replace `:ta` by `:a`. The scripts seems to require newline in the end of intput. Besides, the script outputs nothing, if no matches are found. Test: `printf 123 | sed -n ':;s/123//p;t' | sed -n '$='` outputs nothing. – jarno Sep 04 '15 at 18:09

Counting regex pattern matches in one line using sed or grep?

5 Answers5

Linked