1

Here is my regex:

href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))

And here is what I have:

"<p>dfhdfh</p>\r\n<p><a href=\"/Content/blabla/345/344\">najnov</a></p>\r\n<p>&nbsp;</p>\r\n<p><a href=\"/Content/blabla/345/323:test 1\">test 1&nbsp;</a></p>"

But m.Groups are:

{href="/Content/blabla/345/344"}
{/Content/blabla/345/344}

How to get the second href in m?

Here is my code:

Match m = Regex.Match(myString, "href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))", RegexOptions.IgnoreCase);
                if (m.Success)
                {
                    for (int ij = 0; ij < m.Groups.Count; ij++)
                        myString = myString.Replace(m.Groups[ij].Value.Substring(7), m.Groups[ij].Value.Substring(m.Groups[ij].Value.LastIndexOf("/") + 1));
                }
petko_stankoski
  • 10,459
  • 41
  • 127
  • 231
  • 5
    [You shouldn't try to parse HTML with regexes.](http://stackoverflow.com/a/1732454/41071) Use a HTML parser instead, like HTML Agility Pack. – svick Apr 20 '12 at 09:47
  • Also, could you show us your code that actually uses your regex? – svick Apr 20 '12 at 09:48
  • `(?<=href\=")[^]+?(?=")` You might as well try this. – Prix Apr 20 '12 at 10:01
  • Maybe you should explain what you want to achieve, this replace using substring on a capturing group looks quite strange. – stema Apr 20 '12 at 10:11

3 Answers3

1

From testing this using RAD software RegEx designer.

This regex returns multiple matches, with one group within each match. So you shouldn't be trying to get your result from the Group (named "1"), you should be iterating over the collection of matches and retrieving the value of each (or the group from within each).

This is the result that gets output:

output from RAD RegEx designer

So you should be calling Regex.Matches in your code, and iterate through the results, not Regex.Match.

GShenanigan
  • 5,409
  • 5
  • 38
  • 48
1

Apart from the html/regex stuff, to get all results at once, use Matches, that method returns a MatchCollection that contains all found Match objects.

See The MatchCollection and Match Objects on msdn.

svick
  • 236,525
  • 50
  • 385
  • 514
stema
  • 90,351
  • 20
  • 107
  • 135
0

I'm going to assume the original string is this:

<p>dfhdfh</p>
<p><a href="/Content/blabla/345/344">najnov</a></p>
<p>&nbsp;</p>
<p><a href="/Content/blabla/345/323:test 1">test 1&nbsp;</a></p>

..and what you posted is the string literal you would use to create that string. Getting all the href attributes out of that is as simple as this:

Regex r = new Regex(@"href\s*=\s*(?:""(?<HREF>[^""]*)""|(?<HREF>\S+))");

foreach (Match m in r.Matches(htmlString))
{
  Console.WriteLine(m.Groups["HREF"].Value);
}

I changed the name of the capturing group to HREF to make it clear that we're retrieving the group by its name, not by its number.

As you can see, you're doing whole lotta work you don't need to do.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156