3

I'd like to count the instances of 'text' after each 'header'. I'm using grep and awk but open to any tools. My file looks like this:

header1
text1
text2
text3
header2
text1
header3
header4
text1
text2
...

A great output would look like this

header1 3
header2 2
header3 0
header4 2
...

My question is similar to this, but requires not counting the total occurrences and instead the occurrences between a certain string.

Community
  • 1
  • 1
philshem
  • 24,761
  • 8
  • 61
  • 127

2 Answers2

4

This awk command does not store the entire file in memory:

awk '/^header/{if (head) print head,k;head=$1; k=0}!/^header/{k++}END{print head,k}' file

If you are only interested in counting the lines containing text, then change the script to this:

awk '/^header/{if (head) print head,k;head=$1; k=0}/text/{k++}END{print head,k}' file
user000001
  • 32,226
  • 12
  • 81
  • 108
2

With awk:

$  awk '{if (/header/) {h=$0; a[h]=0} if (/text/) {a[h]++}} END{for (i in a) {print i" "a[i]}}' file
header1 3
header2 1
header3 0
header4 2
  • {if (/header/) {h=$0; a[h]=0} if (/text/) {a[h]++}} fills the array a[] with the number of matches of each "text" line after each "header" line.
  • END{for (i in a) {print i" "a[i]}} prints the result after reading the file.
fedorqui
  • 275,237
  • 103
  • 548
  • 598