1

its giving count as 2 where as pattern occurred thrice in the string

It is giving count as 2 where as pattern occurred thrice in the string

echo "axxxaaxx" |  grep -o  "xx" | wc -l
echo "axxxaaxx" |  grep -o  "xx"
user2481458
  • 31
  • 1
  • 8

2 Answers2

2

grep doesnt support overlapping matching of regex. It consumes the characters which get matched. In this case you can enable Perl Compatible Regex (PCRE) using -p switch and use positive look ahead assertion like this:

$ echo "axxxaaxx" | grep -oP "x(?=x)"
x
x
x
$ echo "axxxaaxx" | grep -oP "x(?=x)" | wc -l
3
$

regex(?=regex2) Positive look ahead assertion finds all regex1 after which regex2 follows. While matching chars for regex2 it does NOT consume the chars hence that's the reason you get 3 matches.

x(?=x) Positive look ahead assertion finds all x that has x after it.

In the string xxx, 1st x matches because it has x after it, 2nd x too and 3rd x doesn't.

More info and easy examples can be found here

Community
  • 1
  • 1
riteshtch
  • 8,629
  • 4
  • 25
  • 38
  • This is the sub part of http://stackoverflow.com/questions/37050030/no-of-occurences-of-patterns-in-string/37050288#37050288 – user2481458 May 05 '16 at 12:15
1

Using -P will enable PCRE which supports lookarounds:

echo "axxxaaxx" | grep -P '(?<=x)x'

In this case we are using a lookbehind which means that we will match an x which have an x before it. This makes us able to have overlapping matches:

How the regex is "evaluated":

 xxx
^^
|Cursor
Looking for x on this position, since there is nothing this will not match

 xxx
 ^^
 |Cursor
 Looking for x on this position since it's found we got a match

 xxx
  ^^
  |Cursor
  Looking for x on this position since it's found we got a match
Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
  • Thanks . This is valid for this case but I was trying to achieve this : Given a string, return the count of the number of times that a substring length 2 appears in the string and also as the last 2 chars of the string, so "hixxxhi" yields 1 (we won't count the end substring). last2('hixxhi') → 1 last2('xaxxaxaxx') → 1 last2('axxxaaxx') → 2 – user2481458 May 05 '16 at 11:36