0

I am trying to decode this HTML input string:

            <tbody>
     <tr  class="row0 first" >
                     <td  class="col0 first" data-colid="0" >
<span>Previous close</span>
        </td>
         <td  class="col1 last" data-colid="1" >
<span>39.64</span>
        </td>
     </tr>
     <tr  class="row1" >
                     <td  class="col0 first" data-colid="0" >
<span>Open</span>
        </td>
         <td  class="col1 last" data-colid="1" >
<span>39.50</span>
        </td>
     </tr>
     <tr  class="row2" >
                     <td  class="col0 first" data-colid="0" >

I tried this Regular Expression:

\<span\>Previous\sclose\<\/span\>[\w\n\r\<\/\s\>\=\"\-]+\<span\>[0-9\.]+\<\/span\>

and the output I am getting is:

<span>Previous close</span>
        </td>
         <td  class="col1 last" data-colid="1" >
<span>39.64</span>

I am interested only in the value of 39.64. Is there a way to use REGEX to only match the "39.64" without the getting the rest of the HTML?

Thx

eitan barazani
  • 1,123
  • 3
  • 18
  • 34
  • 3
    Don't use regex. Use some toolkit to parse the data, like [HTML Agility Pack](http://htmlagilitypack.codeplex.com/) – crashmstr May 09 '14 at 19:20
  • 1
    `(?<=Previous\sclose<\/span>[\w\n\r\<\/\s\>\=\"\-]+)[0-9\.]+(?=<\/span>)` – Ulugbek Umirov May 09 '14 at 19:21
  • Ulugbek, thx. Tried your pattern with a slight change, works on Expresso, but not on Regex in c# Visual Studio. @"(?<=Previous\sclose[\s\w\n\r<>/"=-]+)[0-9\.]+(?=)" The problem seems to me to be with the " character. – eitan barazani May 10 '14 at 00:13
  • 1
    @eitanbarazani Yes, because of `@` you need to double `"` as follows `@"(?<=Previous\sclose[\s\w\n\r<>/""=-]+)[0-9\.]+(?=)"` – Ulugbek Umirov May 10 '14 at 05:23

1 Answers1

2

You should try parsing the HTML instead of trying to find values inside it.

Please check the HtmlAgilityPack for parsing HTML.

sergiogarciadev
  • 2,061
  • 1
  • 21
  • 35