1

I want to extract the values between TDs from this piece of text (html markup).

<tr id="pair_169">
   <td id="cr_12cl_last169">16,294.61</td>
   <td>16,294.61</td><td>16,318.11</td>
   <td class="">16,225.25</td>
   <td class="bold greenFont">73.47</td>
   <td class="bold greenFont">0.45%</td>
   <td id="cr_12cl_date169">23/12</td>
</tr>

what would be the best Regex pattern?

Kamran1358
  • 13
  • 1
  • 4
  • you could try using html parser see [here](http://stackoverflow.com/questions/56107/what-is-the-best-way-to-parse-html-in-c) and [here](http://stackoverflow.com/questions/6063203/parsing-html-with-c-net) – Runcorn Dec 24 '13 at 05:32

3 Answers3

1

You can use the following code:

const string pattern = @"<td\b[^>]*?>(?<V>[\s\S]*?)</\s*td>";
foreach (Match match in Regex.Matches(inputText, pattern, RegexOptions.IgnoreCase))
{
    string value = match.Groups["V"].Value;

    Console.WriteLine(value);
}
Usman Zafar
  • 1,919
  • 1
  • 15
  • 11
1

Try this regex

<td>(.*?)</td>

Or this, but it is used to match exactly TD only with TR

(?<1><TR[^>]*>\s*<td.*?</tr>)
Vignesh Kumar A
  • 27,863
  • 13
  • 63
  • 115
1

I know this is a old thread but this one helped me for similar situation

<td\b[^>]class=".*?>(.*?)<\/td>
Rushabh
  • 651
  • 5
  • 15