0

Here is my regular expression

Dim TableHeaderExpression As String = "<th[^>]*>(.*?)</th>"

and here is my HTML

<th class="seller-col">
 <b>Relevanz</b>
 <span class="ps-sprite ps-sprite-sortdw" title=""></span>
 </th>

this expression gives me everything inside the th Tag so it outputs

<b>Relevanz</b>
     <span class="ps-sprite ps-sprite-sortdw" title=""></span>

but how i make it output only

Relevanz

meaning ignore all the text inside <th> except for whats inside <b>

user1570048
  • 880
  • 6
  • 35
  • 69
  • 2
    Regex is a poor option for [parsing HTML](http://stackoverflow.com/a/1732454/1583). – Oded Oct 29 '12 at 20:54
  • 1
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Wug Oct 29 '12 at 20:55
  • @Oded no its not, i am using it to transfor an HTML table to a Datatable and so far its perfect – user1570048 Oct 29 '12 at 20:56
  • 1
    Use [Html Agility Pack](http://htmlagilitypack.codeplex.com/). – Tim Schmelter Oct 29 '12 at 20:58
  • @user1570048 yes, it is a poor option. read the linked question for various detailed explanations of why. – Wug Oct 29 '12 at 20:59
  • well i am done writing the function to do so tested it on a non complicated table and its really perfect, i will be using this function only for the tables on my website so they always have the same syntax, i got it its not a good option but can some one answer the question? :D – user1570048 Oct 29 '12 at 21:02

1 Answers1

1

Instead of using Regex for parsing HTML (not the best option), use the HTML Agility Pack to parse and query the HTML.

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Community
  • 1
  • 1
Oded
  • 489,969
  • 99
  • 883
  • 1,009