1

The returned html data is in an incorrect format, I have it from Fiddler, and now I want to parse that incorrect format data, I tried to use Fizzler to parse it but it cannot read the class or id of div tags because of the incorrect format: here is my html data :

I used Regex to cut off the root (resposta = ) to get just the html content, but still it did not work out for the parsing. (Regex : resposta\s=\s"(?(.|\n)\*.*)" )

I guess due to the \ symbol in the html content, the parser cannot parse the content with \

I have quoted here one small part of the html returned data:

resposta = "<div style=\" margin-top:10px;width: 100%; position:relative;height:56px;\"><a href=\"\/WebsiteRoot\/v2\/?hotelinfo&ss=433&landingpage=hfofertafranca\" rel=\"nofollow\" title=\"Offre Speciale\" onClick=\"_gaq.push([\'_trackEvent\', \'Banner Promocode Booking\', \'Click\', \'Click idioma fr\',,false]);\" class=\"addlink det\"><img src=\"\/rootimages\/ofertaespecial_fr.png\" height=\"56\" width=\"891\" alt=\"Offre Speciale\"\/><\/a><\/div><div class=\"tabBoxdisp\" style=\"margin-top:10px\"><div class=\"tabtitdisp redondotop\" style=\"color:#FFF; background:#9D293F;\"><div class=\"float-left\"><h2 class=\"upcase size18\">HF F&Eacute;NIX LISBOA<\/h2> Lisboa\/Portugal<\/div><div class=\"float-right text-right\" style=\"width:350px;\"><img src=\"\/rootimages\/icons\/star_white.png\" width=\"14\" height=\"13\" \/><img src=\"\/rootimages\/icons\/star_white.png\" width=\"14\" height=\"13\" \/><img src=\"\/rootimages\/icons

Here is the full data : http://notepad.cc/share/AReb0eaiqH

So is there anyway that I can fix the html content without \ to make it work for the HTML parser?

Sam Nguyen
  • 11
  • 2

1 Answers1

0

The solution might be as simple as replacing '\"' (backslash quote) in your data with '"' (quote), such as:

data = data.Replace("\\\"","\"");

(You also might have to remove the first and last quote (if they exist)).