-2
string a = @"<div class=""b-stats""><a name=""wm-table-used""></a><table class=""stats""><thead><tr><th class=""lb""><i class=""lgap""></i></th><th class=""cell  first-cell""><a title=""Сайт"" class=""sortable"" href=""/limit_info.xml?&amp;order_by=host-name&amp;order_by_mode=desc"">Сайт</a></th><th class=""cell number ""><a title=""Лимит"" class=""sortable"" href=""/limit_info.xml?&amp;order_by=host-limit&amp;order_by_mode=desc"">Лимит</a></th><th class=""cell number ""><a title=""Получатель лимита""></a></th><th class=""rb""><i class=""rgap""></i></th></tr><tr class=""shadow""><td class=""head-shadow"" colspan=""5""></td></tr></thead><tbody><tr><td class=""lb""></td><td class=""cell "">rollstavni-msk.ru</td><td class=""cell number"">10</td><td class=""cell"" style=""text-align: right; width: 35%""><a href=""/delegate_limit.xml?host=19814830"">Передать лимит</a></td><td class=""lb""></td></tr><tr class=""another""><td class=""lb""></td><td class=""cell "">tapetum.ru</td><td class=""cell number"">10</td><td class=""cell"" style=""text-align: right; width: 35%""><a href=""/delegate_limit.xml?host=19888241"">Передать лимит</a></td><td class=""lb""></td></tr><tr><td class=""lb""></td><td class=""cell "">www.maga.ru</td><td class=""cell number"">400</td><td class=""cell"" style=""text-align: right; width: 35%""><a href=""/delegate_limit.xml?host=5485565"">Передать лимит</a></td><td class=""lb""></td></tr><tr class=""another""><td class=""lb""></td><td class=""cell "">stilemaster.ru</td><td class=""cell number"">0</td><td class=""cell"" style=""text-align: right; width: 35%""><a href=""/delegate_limit.xml?host=19886870"">Передать лимит</a></td><td class=""lb""></td></tr></tbody><tfoot><tr><th class=""lb""></th><th colspan=""3""></th><th class=""rb""></th></tr></tfoot></table></div><div class=""b-static-text"">";
            Regex rgx = new Regex(@"<td class=""cell "">(?<domain>[^""]+)<\/td.+number"">(?<id>[^""]+)<\/td", RegexOptions.Singleline);
            MatchCollection matches = rgx.Matches(a);

Why matches.Count = 1 ? It might be 4.

1 Answers1

3

As has already been pointed out, you shouldn't use regular expressions to parse HTML / XML. That said, the problem is that .+ is greedy so it will consume all the characters it can find to satisfy the match.

Use a non-greedy quantifier (.+?) in your pattern instead:

@"<td class=""cell "">(?<domain>[^""]+)<\/td.+?number"">(?<id>[^""]+)<\/td"
                                              ^ see the ? here

Your matches collection will now contain 4 items.

Further Reading

Community
  • 1
  • 1
p.s.w.g
  • 146,324
  • 30
  • 291
  • 331