5

I need C# string search algorithm which can match multiple occurance of pattern. For example, if pattern is 'AA' and string is 'BAAABBB' Regex produce match result Index = 1, but I need result Index = 1,2. Can I force Regex to give such result?

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • pattern '(?=A)' gives good results but enormously exten calc time. I have a string with 20M characters and calc speed is very important. Does anyone has other solution? Thanks. –  Jan 09 '09 at 12:50
  • "(?=A)" doesn't do what you want anyway; have you tried "A(?=A)" like AnthonyWJones suggested? – Alan Moore Jan 10 '09 at 16:56

6 Answers6

13

Use a lookahead pattern:-

"A(?=A)"

This finds any A that is followed by another A without consuming the following A. Hence AAA will match this pattern twice.

AnthonyWJones
  • 187,081
  • 35
  • 232
  • 306
4

To summarize all previous comments:

Dim rx As Regex = New Regex("(?=AA)")
Dim mc As MatchCollection = rx.Matches("BAAABBB")

This will produce the result you are requesting.

EDIT:
Here is the C# version (working with VB.NET today so I accidentally continued with VB.NET).

Regex rx = new Regex("(?=AA)");
MatchCollection mc = rx.Matches("BAAABBB");
Sani Huttunen
  • 23,620
  • 6
  • 72
  • 79
0

Any regular expression can give an array of MatchCollection

Dror
  • 7,255
  • 3
  • 38
  • 44
0

Try this:

       System.Text.RegularExpressions.MatchCollection  matchCol;
       System.Text.RegularExpressions.Regex regX = new System.Text.RegularExpressions.Regex("(?=AA)");

        string index="",str="BAAABBB"; 
        matchCol = regX.Matches(str);
        foreach (System.Text.RegularExpressions.Match mat in matchCol)
            {
                index = index + mat.Index + ",";
            }                       

The contents of index are what you are looking for with the last comma removed.

Lonzo
  • 2,758
  • 4
  • 22
  • 27
0

Are you really looking for substrings that are only two characters long? If so, searching a 20-million character string is going to be slow no matter what regex you use (or any non-regex technique, for that matter). If the search string is longer, the regex engine can employ a search algorithm like Boyer-Moore or Knuth-Morris-Pratt to speed up the search--the longer the better, in fact.

By the way, the kind of search you're talking about is called overlapping matches; I'll add that to the tags.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156