0

I have a string:

<a href="mailto:me@company.com">Joel Werner</a>

and I need to strip everything off but my name

The expression I have now, almost does that.

var pattern = new System.Text.RegularExpressions.Regex(">(?<name>.+?)<");

But when I match them I get

>Joel Werner<

What am I missing, because I do not really like regular expressions

Joel Werner
  • 103
  • 1
  • 8

4 Answers4

1

Use groups to get matched name:

var name = pattern.Match(input).Groups["name"].Value;

You can also verify Success of match before referencing group:

var match = pattern.Match(input);
if (match.Success)
    name = match.Groups["name"].Value;

Also you can reference group by index Groups[1].

Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
1

If you don't like regular expressions, don't use them in this case. Parsing HTML with regular expressions is usually very bad. See this answer on why.

Using CsQuery:

Console.WriteLine(CQ.Create("<a href=\"mailto:me@company.com\">Joel Werner</a>"). //create the selector
Attr("href"). //get the href attribute
Split(new char[]{':','@'})[1]); //split it by : and @ and take the second group (after the mailto)

Using built in LINQ to XML:

 XDocument doc = XDocument.Parse("<a href=\"mailto:me@company.com\">Joel Werner</a>");
 Console.WriteLine(doc.Element("a").Attribute("href").ToString().Split(new char[] {':', '@'})[1]);
Community
  • 1
  • 1
Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
0

Use this Regex

<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>

then use 2nd match, first match is tag type.

Senad Meškin
  • 13,597
  • 4
  • 37
  • 55
0
var input = "<a href=\"mailto:me@company.com\">Joel Werner</a>";
var pattern = new System.Text.RegularExpressions.Regex(@"<a\shref=""(?<url>.*?)"">(?<name>.*?)</a>");
var match = pattern.Match(input);
var name = match.Groups["name"].Value;
kodelab
  • 21
  • 2