I need C# string search algorithm which can match multiple occurance of pattern. For example, if pattern is 'AA' and string is 'BAAABBB' Regex produce match result Index = 1, but I need result Index = 1,2. Can I force Regex to give such result?
-
pattern '(?=A)' gives good results but enormously exten calc time. I have a string with 20M characters and calc speed is very important. Does anyone has other solution? Thanks. – Jan 09 '09 at 12:50
-
"(?=A)" doesn't do what you want anyway; have you tried "A(?=A)" like AnthonyWJones suggested? – Alan Moore Jan 10 '09 at 16:56
6 Answers
Use a lookahead pattern:-
"A(?=A)"
This finds any A that is followed by another A without consuming the following A. Hence AAA will match this pattern twice.

- 187,081
- 35
- 232
- 306
To summarize all previous comments:
Dim rx As Regex = New Regex("(?=AA)")
Dim mc As MatchCollection = rx.Matches("BAAABBB")
This will produce the result you are requesting.
EDIT:
Here is the C# version (working with VB.NET today so I accidentally continued with VB.NET).
Regex rx = new Regex("(?=AA)");
MatchCollection mc = rx.Matches("BAAABBB");

- 23,620
- 6
- 72
- 79
Try this:
System.Text.RegularExpressions.MatchCollection matchCol;
System.Text.RegularExpressions.Regex regX = new System.Text.RegularExpressions.Regex("(?=AA)");
string index="",str="BAAABBB";
matchCol = regX.Matches(str);
foreach (System.Text.RegularExpressions.Match mat in matchCol)
{
index = index + mat.Index + ",";
}
The contents of index are what you are looking for with the last comma removed.

- 2,758
- 4
- 22
- 27
Are you really looking for substrings that are only two characters long? If so, searching a 20-million character string is going to be slow no matter what regex you use (or any non-regex technique, for that matter). If the search string is longer, the regex engine can employ a search algorithm like Boyer-Moore or Knuth-Morris-Pratt to speed up the search--the longer the better, in fact.
By the way, the kind of search you're talking about is called overlapping matches; I'll add that to the tags.

- 73,866
- 12
- 100
- 156