0

hay all. i am trying to transform html to xml meaning extracting all elements with text using this code is not working maybe some one has the answer ?

System.Xml.Linq.XElement query1 = new System.Xml.Linq.XElement("RawHTMLData",
           from q in hDoc.Descendants("TABLE")
           where q.HasElements 
           select new System.Xml.Linq.XElement("TABLE" + (++i).ToString(),
           from j in q.Elements("TR")
           where j.HasElements && j.Descendants("div") != null
           select new System.Xml.Linq.XElement("Row",
           from hh in j.Descendants("div")
           where tt => j.Descendants("div").Contains(hh.Value) 
           select(TT(hh)))));
Ondrej Janacek
  • 12,486
  • 14
  • 59
  • 93
guy
  • 183
  • 2
  • 3
  • 11

3 Answers3

0

You cannot use Linq to Xml to parse HTML becase html may be not valid as xml.

Andrew Bezzub
  • 15,744
  • 7
  • 51
  • 73
  • yes i know .. this problem is already solved by replacing the bad strings. so no i can but i want to chose only the XText element \or only the element with value – guy Mar 28 '10 at 09:44
0

Not sure if this would work for you but you might look at using a third party tool such as HTML Tidy to convert from HTML to XHTML. Then you can treat your html like XML. Here is a link to a post discussing that.

Community
  • 1
  • 1
Abe Miessler
  • 82,532
  • 99
  • 305
  • 486
0

I think you should use HTML Agility Pack, it has helped me much! :)

Old project Page: http://htmlagilitypack.codeplex.com/

wp78de
  • 18,207
  • 7
  • 43
  • 71
Hosane
  • 915
  • 9
  • 19