Find string using regex and regular expressions

Question

I have this text, and I try to print a1 and a2

<a href="a1" title="t1"> k1 </a>
<a href="a2" title="t2"> k2 </a>

Here is my attempt:

string html =  "<a href=\"a1\" title=\"t1\"> k1 </a>";
       html += "<a href=\"a2\" title=\"t2\"> k2 </a>";

 //here is how I think my logic expression should work:
 //<a href=" [something that is not quote, 0 or more times] " [anything] </a>
Regex regex = new Regex("<a href=\"([^\"]*)\".*</a>");
foreach (Match match in regex.Matches(html)
    Console.WriteLine(match.Groups[1]);

Why does this only print a1? I am pretty sure I am doing it right. What am I missing ?

I don't code in c# but I think the `.*` should be `.*?` so it is non-greedy. Currently you'll go to the last ``. — chris85, May 06 '15 at 00:01

score 2 · Accepted Answer · edited May 23 '17 at 12:30

2

Your regular expression .* is consuming all characters upto the second </a>. What you need is lazy consumption with .*? so that it only consumes all characters up to the first </a>:

Regex regex = new Regex("<a href=\"([^\"]*)\".*?</a>");

Meanwhile, Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms

edited May 23 '17 at 12:30

Community

1
1

answered May 06 '15 at 00:02

William

1,007
7
11

this way, `("")` works too and its cleaner syntax-wise. I didn't know about the `?` symbol – dimitris93 May 06 '15 at 00:04

Find string using regex and regular expressions

1 Answers1