-3

I need to get the source code from a website, which is structured in frames.

I already have a Windows Forms Application which has the WebBrowser function integrated into it.
When I do a right click and select "View Source" it opens a new text document with exactly the info I need.

I already tried webBrowser.Document, webBrowser.DocumentText and webBrowser.DocumentStream, but all of these only give me other info, I don't need.

The website is not static (it's a chat) and it does not do sessions, thus I can't use Webclient.DownloadFile.
I need to have an ongoing connection to the website for several hours without refreshing the website. I don't see a way around using the webBrowser in Windows Forms.

As requested, this is the website, I am talking about: http://server2.webkicks.de/stackoverflow-test/
You can just log in as a guest, by filling some username in the third textbox.

NotTelling
  • 462
  • 3
  • 11
  • 1
    Do some research before posting Q please. – Tatranskymedved Nov 28 '16 at 09:31
  • 3
    Possible duplicate of [How can I download HTML source in C#](http://stackoverflow.com/questions/599275/how-can-i-download-html-source-in-c-sharp) – Tatranskymedved Nov 28 '16 at 09:31
  • Why don't you just use `HttpClient` to download from a wesbite? https://www.dotnetperls.com/httpclient – Ali Bahrami Nov 28 '16 at 09:32
  • @Tatranskymedved This actually does not help me a single bit – NotTelling Nov 28 '16 at 09:35
  • @Ali The website uses a login system, it is not static – NotTelling Nov 28 '16 at 09:36
  • 1
    I think `WebBrowser` or any Headless WebBrowsers like `CefSharp` isn't an answer for what you are doing. I suggest you to take a look on this page:http://stackoverflow.com/questions/17183703/using-webclient-or-webrequest-to-login-to-a-website-and-access-data – Ali Bahrami Nov 28 '16 at 09:38
  • That duplicate does exactly what your title asks for, gets html source. But modern webpages use *much* more than html – Sayse Nov 28 '16 at 09:39
  • @Ali Thanks, I will look into that – NotTelling Nov 28 '16 at 09:40
  • @Sayse Well, it does not answer, what I describes in the question itself. The title is not the whole question, I believe – NotTelling Nov 28 '16 at 09:41
  • @Ali Sadly, this is not what I was looking for. The website, I'm trying to access, does not use sessions, so I need to uphold a connection after logging in and read new html every ~3 seconds – NotTelling Nov 28 '16 at 09:45
  • Possible duplicate of [Getting the HTML source through the WebBrowser control in C#](http://stackoverflow.com/questions/5164733/getting-the-html-source-through-the-webbrowser-control-in-c-sharp) – sam Nov 28 '16 at 09:46
  • @sam I already saw this question on friday. It does not work in my case – NotTelling Nov 28 '16 at 09:47

2 Answers2

1

As you wish to get the dynamic html content, and webBrowser.Document, webBrowser.DocumentText and webBrowser.DocumentStream are not working to your wish.

Here's the trick: You can always run your custom JavaScript code from C#. And here's how you can get the current HTML in your WebBrowser control:

webBrowser.Document.InvokeScript("eval", new string[]{"document.body.outerHTML"});

Refer to How to inject Javascript in WebBrowser control?.

Update

For iframe inside your document, you can try the following:

webBrowser.Document.InvokeScript("eval", new string[]{"document.querySelector(\"iframe\").contentWindow.document.documentElement.outerHTML"});

Another update

As your site contains the frame instead of iframe, here is how you can get the html content of that frame:

webBrowser.Document.InvokeScript("eval", new string[]{"document.querySelector(\"frame[name='mainframe'\").contentWindow.document.documentElement.outerHTML"});

Final tested and working update

querySelector is not working in WebControl. So the workaround is: Provide some id to your <frame>, and fetch that <frame> element using that id. Here is how you can achieve your task.

HtmlElement frame = webBrowser1.Document.GetElementsByTagName("frame").Cast<HtmlElement>().FirstOrDefault(m => m.GetAttribute("name") == "mainframe");
if (frame != null)
{
    frame.Id = "RandID_" + DateTime.Now.Ticks;
    string html = webBrowser1.Document.InvokeScript("eval", new string[] { "document.getElementById('" + frame.Id + "').contentWindow.document.documentElement.outerHTML" }).ToString();
    Console.WriteLine(html);
}
else
{
    MessageBox.Show("Frame not found");
}
Community
  • 1
  • 1
sam
  • 931
  • 2
  • 13
  • 26
  • Thanks for your answer. Although this does give me html source code, it is not the one, i was looking for. I think I need the source code of the frame, I am looking at. Injecting Javascript is the best way though, as you suggested – NotTelling Nov 28 '16 at 10:08
  • @TristanB. your ques. doesn't say `Iframe` anywhere. No worry, I am updating the answer for iframe. – sam Nov 28 '16 at 10:12
  • I'm sorry. I have trouble putting my problem into technical terms, as I am not a professional, but a learner. Thanks! – NotTelling Nov 28 '16 at 10:13
  • @TristanB. No problem. Sorry, if i discouraged you. I also did the similar thing once upon a time. lol. I have updated the answer. try once. – sam Nov 28 '16 at 10:15
  • You did not discourage me, no worries. I tried your updated solution and it prints an empty new line into my file. Would it help, to see the website in question for yourself? – NotTelling Nov 28 '16 at 10:23
  • I edited the question. You can find the website there. – NotTelling Nov 28 '16 at 10:37
  • @TristanB. I have the updated the answer with **another update**. I hope this finally works for you. – sam Nov 28 '16 at 11:04
  • Thanks for your time to this point. This again prints an empty line. Maybe this is due to my printing method. Can I print the object from your answer by using The `WriteLine` method from `StreamWriter` to write it to a txt file? – NotTelling Nov 28 '16 at 11:14
  • @TristanB. I have updated the answer with **Final tested and working update**. Try once. It will work. :) – sam Nov 29 '16 at 05:43
  • Awesome. It helped. :) – sam Nov 29 '16 at 08:44
0

If your website target use the ssl protocol (https) you can try adding the user-agent like this :

using (WebClient myWebClient = new WebClient())
                            {
                                myWebClient.Headers.Add("User-Agent: Other");               
                                myWebClient.DownloadFile(new System.Uri("https://mywebsite.com//somefile"), "D:\\temp\\somefile");
                            }

If your website target needs login, then you log in into your websitetarget in chrome and use EditThisCookie extension to copy your cookies and try this one :

using (WebClient myWebClient = new WebClient())
                            {
                                myWebClient.Headers.Add("User-Agent: Other");
                                myWebClient.Headers.Add(HttpRequestHeader.Cookie, "mycookies copies from EditThisCookie");
                                myWebClient.DownloadFile(new System.Uri("https://mywebsite.com//somefile"), "D:\\temp\\somefile");
                            }
Mehdi Souregi
  • 3,153
  • 5
  • 36
  • 53
  • Thanks for your answer. In my case, I don't need to download a file, but keep constant track of fast changing html. To even get to that html, I'm looking for, the website in question needs to be opened. If I close it, I will need to login again. Does your answer work for these conditions? – NotTelling Nov 28 '16 at 09:58
  • For the first part of your question, you can try DownloadString instead of DownloadFile, and then try adding a while (true) and a Thread.Sleep(2000) inside it, which means you will check the content of your target page every 2000ms – Mehdi Souregi Nov 28 '16 at 10:07
  • For the second part, a cookie have an expiration date, wich means if it is expired you can no longer get the content of your target page, thus the only solution you got is to do it manually, to log in again , copy your cookies and insert them on your web client header. – Mehdi Souregi Nov 28 '16 at 10:13
  • I tried it now, by adding a copied cookie to the WebClient Header. This just gets me to the normal login page. I think, this web page does not work with cookies, or at least not in a helpful way for this scenario – NotTelling Nov 28 '16 at 10:17
  • If you close your browser, and re-open it again, do you need to log in again ? – Mehdi Souregi Nov 28 '16 at 10:29
  • Yes. Same for closing the tab – NotTelling Nov 28 '16 at 10:30
  • can you share with us the website link? i am really interested to know how it works and maybe i can find a solution for you since i am working on a similar project of downloading content from a ssl website – Mehdi Souregi Nov 28 '16 at 10:32
  • I have edited the question. The website link is there. – NotTelling Nov 28 '16 at 10:37