1

I want to get plain text using WebRequest class, just like what we get when we use webbrowser1.Document.Body.InnerText . I have tried the following code

public string request_Resource()
{
   HttpWebRequest request = (HttpWebRequest)WebRequest.Create(myurl);
   Stream stream = request.GetResponse().GetResponseStream();
   StreamReader sr = new StreamReader(stream);
   WebBrowser wb = new WebBrowser();
   wb.DocumentText = sr.ReadToEnd();
   return wb.Document.Body.InnerText;
}

when i execute this is get a NullReferenceException.

Is there a better way to get a plain text.

Note: I cannot use webbrowser control directly to load the webpage, because, i don't want to deal with all those events that fire up multiple times when ever a page is loaded.

UPDATE: I have changed my code to use WebClient Class instead of WebRequest upon suggestion My code looks something like this now

public string request_Resource()
{
   WebClient wc = new WebClient();
   wc.Proxy = null;
   //The user agent header is added to avoid any possible errors
   wc.Headers.Add("user-agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10 ( .NET CLR 3.5.30729; .NET4.0C)");
   return wc.DownloadString(myurl);
}

I am considering using HTML Utility Pack, can anyone suggest any better alternative.

Vamsi
  • 4,237
  • 7
  • 49
  • 74
  • One the suggestion of @SLaks, I have checked HTML Utility Pack, can anyone suggest a simple solution without using 3rd party libraries. Thank you – Vamsi Nov 25 '10 at 18:47
  • Check out this SO answer for using Html Agility Pack - http://stackoverflow.com/questions/2785092/c-htmlagilitypack-extract-inner-text/2785108#2785108 – Mikael Svenson Nov 25 '10 at 18:57
  • Thank you all for the HTML Utility Pack suggestion, I will definitely consider it, but before doing that can anybody suggest any other way to do this. Coming to the Webclient class, i have already changed my code – Vamsi Nov 25 '10 at 19:03

3 Answers3

3

You're looking for the HTML Agility Pack, which can parse the HTML without IE.
It has an InnerText property.


To answer your question, you need to wait for the browser to parse the text.


By the way, you should use the WebClient class instead of WebRequest.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
1

Use webclient:

public string request_Resource()
{
    WebClient wc = new WebClient();
    byte[] data = wc.DownloadData(myuri);
    return Encoding.UTF8.GetString(data);
}

This will give you the content of the website. Then you can use HtmlAgilityPack to parse the result.

Aliostad
  • 80,612
  • 21
  • 160
  • 208
-2

If you need just plain HTML text, then you have already wrote that code.

public string request_Resource()
{
   HttpWebRequest request = (HttpWebRequest)WebRequest.Create(myurl);
   Stream stream = request.GetResponse().GetResponseStream();
   StreamReader sr = new StreamReader(stream);
   return sr.ReadToEnd();
}
user179437
  • 713
  • 1
  • 7
  • 16
  • I have clearly mentioned that i need plain text, i never mentioned plain html. Anyways thank you – Vamsi Nov 26 '10 at 06:01