I use the following code to translate the HTTP response stream into a XmlDocument.
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
Stream responseStream = response.GetResponseStream();
StreamReader responseReader = new StreamReader(responseStream);
String responseString = responseReader.ReadToEnd();
Console.WriteLine(responseString);
Int32 htmlTagIndex = responseString.IndexOf("<html",
StringComparison.OrdinalIgnoreCase);
XmlDocument responseXhtml = new XmlDocument();
responseString = responseString.Substring(htmlTagIndex); // MARK 1
responseString = responseString.Replace(" ", " "); // MARK 2
responseXhtml.LoadXml(responseString);
return responseXhtml;
The MARK 1 line is to skip the DOC Type definition line.
The MARK 2 line is to avoid the error Reference to undeclared entity 'nbsp'.
Is there any better way to do this? There're too much string operation in the above code.
Thanks!