LookAhead Regex in .Net - unexpected result

Question

I am a bit puzzled with my Regex results (and still trying to get my head around the syntax). I have been using http://regexpal.com/ to test out my expression, and its works as intended there, however in C# its not as expected.

Here is a test - an expression of the following: (?=<open>).*?(?=</open>)

on an input string of: <open>Text 1 </open>Text 2 <open>Text 3 </open>Text 4 <open>Text 5 </open>

I would expect a result back of <open>Text1 <open>Text 2 <open>Text 3... etc

However when I do this in C# it only returns the first match of <open>Text1

How do I get all five 'results' back from the Regex?

    Regex exx = new Regex("(?=<open>).*?(?=</open>)", RegexOptions.IgnoreCase | RegexOptions.Singleline);
    string input = "<open>Text 1</open> Text 2 <open> Text 3 </open> Text 4 <open> Text 5 </open>";
    string result = Regex.Match(input, exx.ToString(), exx.Options).ToString();

I see only three matches there. – Joey Mar 17 '10 at 10:49 — Joey, Mar 17 '10 at 10:49

score 1 · Accepted Answer · edited May 23 '17 at 10:27

1

Use Regex.Matches instead of Regex.Match.

PS Home:> $s = '<open>Text 1 </open>Text 2 <open>Text 3 </open>Text 4 <open>Text 5 </open>'
PS Home:> $re = '(?=<open>).*?(?=</open>)'
PS Home:> @([regex]::Match($s, $re)).Length
1
PS Home:> @([regex]::Matches($s, $re)).Length
3

As the documentation for Regex.Match states:

Searches an input string for a substring that matches a regular expression pattern and returns the first occurrence as a single Match object.

whereas for Regex.Matches:

Searches an input string for all occurrences of a regular expression and returns all the successful matches.

Note: What you're doing here seems very wrong. If what you're dealing with is XML or a similar language, then please don't use regular expressions to parse it. You'll get mad otherwise with nested structures.

edited May 23 '17 at 10:27

Community

1
1

answered Mar 17 '10 at 10:44

Joey

344,408
85
689
683

In that case, you can improve your karma by upvoting and accepting Johannes' answer (see the up-triangle and checkbox next to this post? – Tim Pietzcker Mar 17 '10 at 10:56
I missed the Matches option. Thanks for that link as well, I am doing some basic HTML parsing/scraping, nothing too complex (I think...) I was using a for loop and trawling through the string byte by byte, but thought a Regex would be better (its certainly a lot less code!), Ill have a good read through that question though. – AaronM Mar 17 '10 at 11:01
I can accept the answer (and thanks for pointing me to the place, I was looking for some text to click on, not an image!), but I cant upvote unless I become a member :( – AaronM Mar 17 '10 at 11:04
1

@Aaron: For HTML scraping you can use the HTML Agility Pack (http://www.codeplex.com/htmlagilitypack) which should be a lot more robust than using regular expressions. – Joey Mar 17 '10 at 11:06
@Aaron: Two more rep points to go, and you'll be able to upvote :) – Tim Pietzcker Mar 17 '10 at 12:16

score 0 · Answer 2 · answered Mar 17 '10 at 11:02

0

Do you really want to have <open> at the start of every match? Why not use lookbehind, too?

(?<=<open>).*?(?=</open>)

answered Mar 17 '10 at 11:02

Tim Pietzcker

328,213
58
503
561

Ah, good point. It looks bad in the test data I used, but the real data I am parsing the opening tag can be helpful. Thanks though, Regex is all new to me, but will take a bit to get used to. Now Johannes has given me something else to look at as well! – AaronM Mar 17 '10 at 11:12

LookAhead Regex in .Net - unexpected result

2 Answers2