3

Stack Overflow already has some great posts about counting occurrences of a string (eg. "foo"), like this one: count all occurrences of string in lots of files with grep. However, I've been unable to find an answer to a slightly more involved variant.

Let's say I want to count how many instances of "foo:[*whatever*]*whatever else*" exist in a folder; I'd do:

grep -or 'foo:[(.*)]' * | wc -l

and I'd get back "55" (or whatever the count is). But what if I have a file like:

foo:bar abcd
foo:baz efgh
not relevant line
foo:bar xyz

and I want to get count how many instances of foo:bar vs. how many of foo:bazs, etc.? In other words, I'd like output that's something like:

bar 2
baz 1

I assume there's some way to chain greps, or use a different command from wc, but I have no idea what it is ... any shell scripting experts out there have any suggestions?

P.S. I realize that if I knew the set of possible sub-strings (ie. if I knew there was only "foo:bar" and "foo:baz") this would be simpler, but unfortunately there set of "things that can come after foo:" is unknown.

Community
  • 1
  • 1
machineghost
  • 33,529
  • 30
  • 159
  • 234

1 Answers1

7

You could use sort and uniq -c:

$ grep -orE 'foo:(.*)' * | sort | uniq -c
      2 foo:bar
      1 foo:baz
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • That's awesome, thank you! ... except I'm afraid I over-generalized in my original question. There's also potentially (irrelevant) text after the grepped text, and that text would muck up the `uniq` I think? I've tried to edit the question to be clearer. – machineghost May 03 '13 at 21:15
  • @machineghost `-o` should give you only the actually matched text. Use a pattern other than `.*` if the matched portion is too much. – Gumbo May 03 '13 at 21:17
  • @machineghost Yes, try something like `\S+` (one or more non-whitespace characters) instead of `.*`. – Gumbo May 03 '13 at 21:18
  • Got it (in my actual case I had a closing paren as the boundary, so I wound up doing `foo\([^)]+\)`). – machineghost May 03 '13 at 21:20