Get document content through web browser element

Question

I want to get content from a specific URL. I tried this code:

var request = (HttpWebRequest)WebRequest.Create("https://example.com");
request.Timeout = 5000;
request.Method = "GET";
request.ContentType = "text/xml";

using (var _webResponse = (HttpWebResponse)_request.GetResponse())
{
    var webResponseStatus = _webResponse.StatusCode;
    var stream = _webResponse.GetResponseStream();

    using (var _streamReader = new StreamReader(_stream))
    {
        string plainText = _streamReader.ReadToEnd();
    }
}

The problem is that the content is not relevant, it seems like it returns block of garbage. For example, this is the beginning of the content I receive:

<!doctype html><html itemscope=\"\"
itemtype=\"http://schema.org/WebPage\" dir=\"rtl\"><head><meta
itemprop=\"image\"
content=\"/images/google_favicon_128.png\"><title>Google</title><script>(function(){\nwindow.google={kEI:\"JVMWU4OxMuL9ygOem4GACw\",getEI:function(a){for(var
b;a&&(!a.getAttribute||!(b=a.getAttribute(\"eid\")));)a=a.parentNode;return
b||google.kEI},https:function(){return\"https:\"==window.location.protocol},kEXPI:\"17259,4000116,4007661,4007830,4008067,4008133,4008142,4009033,4009565,4009641,4010297,4010806,4010830,4010858,4010899,4011228,4011258,4011679,4012318,4012373,40125

I want to get the text which displayed in the webpage. How do I do this ? I'll be thankful for any help. Thank you, Avi.

score 2 · Answer 1 · edited May 23 '17 at 11:57

2

That's not garbage. The text that's returned is a bunch of Javascript. When the page is loaded in a browser, the browser executes the Javascript, which downloads some data and modifies the DOM.

If you want the rendered HTML then you can either create a WebBrowser component to display the page and then access the DOM through that component, or you can add a Javascript engine to your C# program and have it interpret the Web page. See Embedding JavaScript engine into .NET for information on how to do that.

edited May 23 '17 at 11:57

Community

1
1

answered Mar 04 '14 at 22:42

Jim Mischel

131,090
20
188
351

What is the most simple and efficient way to do it ? It's not possible to fetch directly the HTML code ? Thank you. – Avraham Mar 04 '14 at 22:54
I know what is the problem. The problem is that facebook doesn't support the Web browser element in C#, and that's why I couldn't fetch the conent from facebook page. From another webpages, like google I can fetch any text I want. It seems like, I don't need any JS to HTML convert but find a way to fetch the content from facebook specifically. Thank you once again, Avi. – Avraham Mar 04 '14 at 23:05
You fetch content from Facebook by calling the Facebook API. See https://developers.facebook.com/docs/reference/apis/ – Jim Mischel Mar 04 '14 at 23:16
Yes, I was just thinking about that ! Thank you ! – Avraham Mar 04 '14 at 23:17

Get document content through web browser element

1 Answers1