EDIT (inspired by @AndyKorneyev's answer):
With HtmlAgilityPack, you can obtain the <span>
tags you need by querying those having myspan
attribute value.
var txt = "<span id=\"myspan\">2,500</span><span id=\"myspan\">500</span>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(txt);
foreach (var node in doc.DocumentNode.ChildNodes.Where(p => p.Name == "span" && p.HasAttributes && p.GetAttributeValue("id", string.Empty) == "myspan"))
{
var val = node.InnerHtml;
Console.WriteLine(val.Replace(",", string.Empty));
}
Outputs:
2500
500
ORIGINAL:
Here is an approach without a regex, using an XElement
and Replace
:
var txxt = "<span id=\"myspan\">2,500</span>\r\n<span id=\"myspan\">500</span>";
var Xelt = XElement.Parse("<root>" + txxt + "</root>");
var vals = Xelt.DescendantsAndSelf("span").Select(p => p.Value.Replace(",", string.Empty)).ToList();
Output:

Or a very weird regex approach removing all commas and tags:
var result = Regex.Replace(txxt, @"(?><(?:\b|/)[^<]*>|,)", string.Empty);
Result is
.
And if you for some reason insist on your approach, just use look-arounds:
var rgx = new Regex(@"(?s)(?<=<\bspan\b[^<]*?\bmyspan\b[^<]*?\>)(?<numbers>[,0-9]*?)(?=</span>)");
var matched = rgx.Matches(txxt).Cast<Match>().Select(p => p.Value.Replace(",", string.Empty)).ToList();