9

The following code evaluates 2 instead of 4:

Regex.Matches("020202020", "020").Count;

I'm guessing the regex starts looking for the next match from the end of the previous match. Is there any way to prevent this. I have a string of '0's and '2's and I'm trying to count how many times I have three '2's in a row, four '2's in a row etc.

krlmlr
  • 25,056
  • 14
  • 120
  • 217
  • Your question is misleading. Do you want to match consecutive `2`-s, or arbitrary sequences? – krlmlr Aug 13 '12 at 22:23

5 Answers5

10

This will return 4 as you expect:

Regex.Matches("020202020", @"0(?=20)").Count;

The lookahead matches the 20 without consuming it, so the next match attempt starts at the position following the first 0. You can even do the whole regex as a lookahead:

Regex.Matches("020202020", @"(?=020)").Count;

The regex engine automatically bumps ahead one position each time it makes a zero-length match. So, to find all runs of three 2's or four 2's, you can use:

Regex.Matches("22222222", @"(?=222)").Count;  // 6

...and:

Regex.Matches("22222222", @"(?=2222)").Count;  // 5

EDIT: Looking over your question again, it occurs to me you might be looking for 2's interspersed with 0's

Regex.Matches("020202020", @"(?=20202)").Count;  // 2

If you don't know how many 0's there will be, you can use this:

Regex.Matches("020202020", @"(?=20*20*2)").Count;  // 2

And of course, you can use quantifiers to reduce repetition in the regex:

Regex.Matches("020202020", @"(?=2(?:0*2){2})").Count;  // 2
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
4

Indeed, a regular expression will continue from where the last one ended. You can work around it by using lookahead patterns. I'm not a .NET guy, but try this: "(?=020)." Translation: "find me any single character, where this character and the next two characters are 020". The trick is that the match is only one character wide, not three, so you will get all the matches in the string, even if they overlap.

(you could also write it as "0(?=20)", but that's less clear to humans at least :p )

Amadan
  • 191,408
  • 23
  • 240
  • 301
1

Try this, using zero-width positive lookbehind:

Regex.Matches("020202020",@"(?<=020)").Count;

Worked for me, yields 4 matches.

My favorite reference for Regex: Regular Expression Language - Quick Reference Also a quick way to try out your Regex, I use it quite often for complex Regex: Free Regular Expression Designer

crlanglois
  • 3,537
  • 2
  • 14
  • 18
0

Assuming that you are indeed looking for sequences of consecutive 2-s, there is another option without using lookaheads at all. (This would not work for arbitrary sequences where you look for patterns of 0 and 2.)

Enumerate all occurrences of non-overlapping sequences of three or more 2-s (how?) and then infer the number of shorter subsequences.

For example, if you find one sequence of six consecutive 2-s and one of five consecutive 2-s, then you know that you must have (6-3+1) + (5-3+1) = ? sequences of three consecutive 2-s (potentially overlapping), and so on:

0002222220000002222200
   222
    222
     222
      222
               222
                222
                 222

For large strings, this should be somewhat faster than using lookaheads.

krlmlr
  • 25,056
  • 14
  • 120
  • 217
-4

Because the source contains two "020" patterns which your regex pattern is matching. Try changing your source to this:

Regex.Matches("020202020", "02").Count;

Now it will match 02's in a row and you will get four this time.

DelegateX
  • 719
  • 3
  • 8
  • 1
    It will return the same result for `"029029029029"` as well. Looking for `"02"` is not equivalent to looking for `"020"`. – Amadan Aug 13 '12 at 22:24