0

Visual Studio 2017, Windows 10, .NET Framework 4.8, CefSharp WinForms 83.4.20, platform target x64

Created new very simple CefSharp Windows Forms application. I cannot get the html source of the web page. I think I have looked at every CefSharp and async to sync question on StackOverflow - have tried so many solutions - my head has turned to mush. This is the first question I looked at - I have the same problem.

Get HTML source code from CefSharp web browser

'browser.ViewSource();' does pop up a notepad with the source of the webpage. But when I try to get a string with the source code - the task never seems to run. The task that runs to get the web page source says ---> Status = WaitingForActivation ---> and never returns with the source.

I have tried the async to sync conversion - probably ten different ways. None work. One StackOverflow solution suggested the Application.DoEvents() - so I even tried that.

Hope someone has some ideas. This browser seems to have a ton of potential - but I need to get the web page source html.

using System;
using System.Threading.Tasks;
using System.Windows.Forms;
using CefSharp;
using CefSharp.WinForms;
using System.Diagnostics;
namespace Test1
{
    public partial class Form1 : Form
    {
        public ChromiumWebBrowser browser;
        public Form1()
        {
            InitializeComponent();
            InitBrowser();
        }
        private void Form1_Load(object sender, EventArgs e)
        {
        }
        private void Form1_FormClosing(object sender, FormClosingEventArgs e)
        {
            browser.Dispose();
            Cef.Shutdown();
        }
        private void exitToolStripMenuItem_Click(object sender, EventArgs e)
        {
            Application.Exit();
        }
        public void InitBrowser()
        {
            Cef.Initialize(new CefSettings());
            browser = new ChromiumWebBrowser("https://google.com/");
            this.Controls.Add(browser);
            browser.Dock = DockStyle.Fill;
            browser.FrameLoadEnd += OnWebBrowserFrameLoadEnded;
        }
        void OnWebBrowserFrameLoadEnded(object sender, FrameLoadEndEventArgs e)
        {
            ChromiumWebBrowser BrowserSender = (ChromiumWebBrowser)sender;
            if (this.InvokeRequired)
            {
                this.Invoke(new MethodInvoker(() => { WebBrowserFrameLoadEnded(BrowserSender, e); }));
            }
            else
            {
                WebBrowserFrameLoadEnded(BrowserSender, e);
            }
        }
        void WebBrowserFrameLoadEnded(ChromiumWebBrowser BrowserSender, FrameLoadEndEventArgs e)
        {
            string html1 = null;
            Task<String> taskString1;

            if (e.Frame.IsMain)
            {
                //browser.ViewSource();
                taskString1 = Task.Run(() => GetBrowserSource(browser));
                while (taskString1.Status != TaskStatus.RanToCompletion)
                {
                    Application.DoEvents();
                    System.Threading.Thread.Sleep(100);
                }
                html1 = taskString1.Result;
                Debug.WriteLine("");
            }
        }

        async Task<string> GetBrowserSource(ChromiumWebBrowser Browser)
        {
            return await Browser.GetMainFrame().GetSourceAsync();
        }
    }
}

my app.config

<?xml version="1.0" encoding="utf-8"?>
<configuration>
    <startup> 
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.8"/>
    </startup>
</configuration>

my packages.config

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="cef.redist.x64" version="83.4.2" targetFramework="net452" />
  <package id="cef.redist.x86" version="83.4.2" targetFramework="net452" />
  <package id="CefSharp.Common" version="83.4.20" targetFramework="net452" />
  <package id="CefSharp.WinForms" version="83.4.20" targetFramework="net452" />
</packages>
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Bubba
  • 192
  • 1
  • 3
  • 14
  • The method should be called asynchronously, calling in a sync fashion is not supported. – amaitland Aug 16 '20 at 20:01
  • Wanted to add a little background information to help others. I am a .NET programmer and have done screen scraping for many years. Used to be we could use windows forms browser control to simply navigate to page, read html source, and extract what we needed. HtmlAgility was a huge improvement in working with the html source. We could even drop the Web Browser control and just load html directly in HtmlAgility. – Bubba Aug 17 '20 at 09:47
  • But as anti screen scraping tools in websites grew, it became difficult to just load html directly in HtmlAgility - all automatically. I went back to semi automatic approach, using Web Browser control to load log in page, manually log in, then let HtmlAgility take over after the login. The latest anti screen scraping tools now require cookies and javascript to be running in browser - so it's difficult to accomplish even login using windows forms web browser control. I have not noticed any problems using CefSharp. – Bubba Aug 17 '20 at 09:50

1 Answers1

5

Looks like a deadlock. That's a proper async/await usage problem.

private async void WebBrowserFrameLoadEnded(ChromiumWebBrowser BrowserSender, FrameLoadEndEventArgs e)
{
    if (e.Frame.IsMain)
    {
        string html1 = await GetBrowserSource(BrowserSender);
        Debug.WriteLine(html1);
    }
}

But why not simply do this?

private async void OnWebBrowserFrameLoadEnded(object sender, FrameLoadEndEventArgs e)
{
    if (e.Frame.IsMain)
    {
        ChromiumWebBrowser browserSender = (ChromiumWebBrowser)sender;
        string html = await browserSender.GetMainFrame().GetSourceAsync();
        Debug.WriteLine(html);
    }
}

Note that Application.DoEvents() isn't safe to use.

aepot
  • 4,558
  • 2
  • 12
  • 24
  • Thank you. Based on your answer I got it to work. With regards to the use of "if (this.InvokeRequired)" they say the CefSharp browser runs on different threads so we should always use InvokeRequired. – Bubba Aug 16 '20 at 22:47
  • @Bubba you're welcome. Some tips. In short, `this.Invoke` runs code on UI thread, `Task.Run` (by default) on a pooled background thread. First usually used for safe UI operations, second - for heavy CPU-bound operations to keep UI responsive. For I/O-bound operations the `await` is enough. Thus `this.Invoke` needed only if you interact with UI, otherwise it looks like schedule to main thread irrelevant redundant work which can be done on any thread. – aepot Aug 17 '20 at 05:27