0

I'm trying to parser this html code:

<tr>
  <td>sdafsadf</td>
  <td>12121</td>
  <td>sdafasdf</td>
  <td>32222</td>
  <td>99999</td>
</tr>

and get second <td> the <td>12121</td> ONLY, I found this link: Support of \K in regex and .NET don't support \K escape sequence, I make the REGEX:

(?s)(^(?:(.*?)(\K<td)){2})(.*?</td>)

and work fine in http://www.regexr.com/, please help me use this REGEX or similar in .NET.

I'm going crazy and tired because of it

Thanks Regards,

Community
  • 1
  • 1
AneEx
  • 53
  • 1
  • 4
  • 1
    Any reason you're not using an HTML parser instead? – Jon Skeet Mar 25 '14 at 20:27
  • Thanks for reply. Yes, i need use REGEX, htmlagilitypack not work for me :( unfortunately – AneEx Mar 26 '14 at 16:42
  • Why not? Please give more details. Whenever you approach a problem with a known-to-be-tricky-or-infeasible solution instead of a "standard" solution, you should explain clearly *why* you've chosen that route. There may be a tweak to the standard solution that would save you a lot of hassle. – Jon Skeet Mar 26 '14 at 16:44
  • Thanks for your interest. Well, basically because I'm giving maintenance in a program made by others. The program get RAW Strings and parse with REGEX, to I use htmlagilitypack means rewriting all the code. This work is not worth it! :'(. This problem is unique I found to use REGEX – AneEx Mar 26 '14 at 17:22
  • Okay, so it's not that it won't work, it's that you're persisting with a known-bad solution. Would you actually have to rewrite the whole code, or just part of it? What's the lifetime of this program likely to be? – Jon Skeet Mar 26 '14 at 17:29

1 Answers1

0

I'd use an HTML parser library like: http://htmlagilitypack.codeplex.com/

(More on that here)

But simply based on your snippet, here's one way:

 var str = @"
 <tr>
    <td>sdafsadf</td>
    <td>12121</td>
    <td>66666</td>
    <td>32222</td>
    <td>99999</td>
 </tr>";
 var re = new Regex(@"(<td>([^<]+)</td>)", RegexOptions.Multiline | RegexOptions.IgnoreCase);
 Console.WriteLine(re.Matches(str)[1].Groups[2].Value);
 Console.WriteLine(re.Matches(str)[2].Groups[2].Value); 
Community
  • 1
  • 1
zagros
  • 134
  • 5
  • Thanks for reply, on this code sample the REGEX get ALL matches of TD, but the regex: `(?s)(^(?:(.*?)(\K)` return ONLY ONE match. I need this, REGEX return only one group and one match, but without use \K in .NET – AneEx Mar 26 '14 at 16:47