2

I have a Windows Form that has a WebBrowser control names formWebBrowser. I am creating a new non-UI thread with another instance of WebBrowser named newThreadBrowser. Reference WebBrowser Control in a new thread

When documentcompleted event is fired I am able to write the url to a textbox using the approach mentioned in C# - Updating GUI using non-main Thread

Now, I am trying to update the html of "formWebBrowser" from the html of "newThreadBrowser". It is causing exception saying Specified cast is not valid.

In WebBrowser control: "Specified cast is not valid.", accepted answer says -

WebBrowser is a COM component under the hood. An apartment threaded one, COM takes care of calling its methods in a thread-safe way. Your Navigate() call works for that reason, it is actually executed on the UI thread. What doesn't work is the DocumentText property, it is implemented in the .NET wrapper and they somewhat fumbled the code. It bombs when the COM interop support in the CLR notices that a thread in the MTA tries to access a property of a component that lives on an STA.

QUESTION

What should I do in-order to render the html from newThreadBrowser in formWebBrowser? I am not sure how Control.Invoke() can resolve this.

Note: This application is not performance critical. So it is okay even if it take some time to execute.

Reference

  1. How to change webBrowser DocumentText?
  2. How do I extract info from a webpage?
  3. http://htmlagilitypack.codeplex.com/

From WebBrowser.DocumentText Property

Use this property when you want to manipulate the contents of an HTML page displayed in the WebBrowser control using string processing tools. You can use this property, for example, to load pages from a database or to analyze pages using regular expressions. When you set this property, the WebBrowser control automatically navigates to the about:blank URL before loading the specified text. This means that the Navigating, Navigated, and DocumentCompleted events occur when you set this property, and the value of the Url property is no longer meaningful.

CODE

public partial class Form1 : Form
{

    public void WriteToTextBoxEvent(object sender, WebBrowserDocumentCompletedEventArgs e)
    {

        #region Textbox
        if (this.textBox1.InvokeRequired)
        {
            //BeginInvoke is Asynchronus
            this.textBox1.BeginInvoke(new Action(() => WriteToTextBoxEvent(sender, e)));
        }
        else
        {
            textBox1.Text = e.Url.ToString();
        }
        #endregion

        #region WebBrowser
        if (this.formWebBrowser.InvokeRequired)
        {
            //BeginInvoke is Asynchronus
            this.textBox1.BeginInvoke(new Action(() => WriteToTextBoxEvent(sender, e)));
        }
        else
        {
            var newThreadBrowser = sender as WebBrowser;
            if (sender != null)
            {
                //The function evaluation requires all threads to run
                formWebBrowser.DocumentText = newThreadBrowser.DocumentText;
            }
        }
        #endregion
    }



    System.Windows.Forms.TextBox textBox1 = new TextBox();
    System.Windows.Forms.WebBrowser formWebBrowser = new WebBrowser();

    public Form1()
    {

        WriteLogFunction("App Satrt");

        // Web Browser
        #region Web Browser
        formWebBrowser.Location = new Point(10, 20);
        formWebBrowser.Size = new Size(1200, 900);
        this.Controls.Add(formWebBrowser);

        textBox1.Location = new Point(0, 0);
        textBox1.Size = new Size(800, 10);
        this.Controls.Add(textBox1);

        var th = new Thread(() =>
        {
            var newThreadBrowser = new WebBrowser();

            //To Process the DOM.
            newThreadBrowser.DocumentCompleted += browser_DocumentCompleted;

            //To update URL textbox
            newThreadBrowser.DocumentCompleted += WriteToTextBoxEvent;

            newThreadBrowser.ScriptErrorsSuppressed = true;
            newThreadBrowser.Navigate(GetHomoePageUrl());

            Application.Run();
        });
        th.SetApartmentState(ApartmentState.STA);
        th.Start();

        #endregion

        // Form1
        this.Text = "B2B Crawler";
        this.Size = new Size(950, 950);

    }

    List<string> visitedUrls = new List<string>();
    List<string> visitedProducts = new List<string>();

    private void ExerciseApp(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        var wbReceived = sender as WebBrowser;
        int catalogElementIterationCounter = 0;
        var elementsToConsider = wbReceived.Document.All;
        string productUrl = String.Empty;
        bool isClicked = false;

        foreach (HtmlElement e1 in elementsToConsider)
        {
            catalogElementIterationCounter++;
            string x = e1.TagName;
            String idStr = e1.GetAttribute("id");
            if (!String.IsNullOrWhiteSpace(idStr))
            {
                //Each Product Navigation
                if (idStr.Contains("catalogEntry_img"))
                {
                    productUrl = e1.GetAttribute("href");
                    if (!visitedProducts.Contains(productUrl))
                    {
                        WriteLogFunction("productUrl -- " + productUrl);
                        visitedProducts.Add(productUrl);
                        isClicked = true;

                        e1.InvokeMember("Click");
                        //nextNavigationUrl = productUrl;

                        break;
                    }

                }
            }
        }


        if (visitedProducts.Count == 4)
        {
            visitedProducts = new List<string>();
            isClicked = true;
            HomoePageNavigate(wbReceived);
        }

        if (!isClicked)
        {
            HomoePageNavigate(wbReceived);
        }
    }

    void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        ExerciseApp(sender, e);
    }


    private string GetHomoePageUrl()
    {
        return @"C:\Samples_L\MyTableTest.html";
    }

    private void HomoePageNavigate(WebBrowser bw)
    {
        WriteLogFunction("HomoePageNavigate");
        bw.Navigate(GetHomoePageUrl());
    }

    private void WriteLogFunction(string strMessage)
    {
        using (StreamWriter w = File.AppendText("log.txt"))
        {
            w.WriteLine("\r\n{0} ..... {1} ", DateTime.Now.ToLongTimeString(), strMessage);
        }
    }

 }

MyTableTest.html

<html>
<head>

    <style type="text/css">
        table {
            border: 2px solid blue;
        }

        td {
            border: 1px solid teal;
        }
    </style>

</head>
<body>

    <table id="four-grid">
         <tr>
            <td>
                <a href="https://www.wikipedia.org/" id="catalogEntry_img63666">

                    <img src="ssss"
                        alt="B" width="70" />
                </a>
            </td>
            <td>
                <a href="http://www.keralatourism.org/" id="catalogEntry_img63667">

                    <img src="ssss"
                        alt="A" width="70" />
                </a>
            </td>
        </tr>
        <tr>
            <td>
                <a href="https://stackoverflow.com/users/696627/lijo" id="catalogEntry_img63664">

                    <img src="ssss"
                        alt="G" width="70" />
                </a>
            </td>
            <td>
                <a href="http://msdn.microsoft.com/en-US/#fbid=zgGLygxrE84" id="catalogEntry_img63665">

                    <img src="ssss"
                        alt="Y" width="70" />
                </a>
            </td>
        </tr>

    </table>
</body>

</html>
Community
  • 1
  • 1
LCJ
  • 22,196
  • 67
  • 260
  • 418
  • Could you explain why exactly you need multiple threads here? – noseratio Apr 03 '14 at 09:38
  • @Noseratio If I do the DOM processing in the UI thread, that would block the UI thread, isn't it? It will affect UI operations like minimizing window. What is your thought on that? Reference [MSDN - Safe, Simple Multithreading in Windows Forms](http://msdn.microsoft.com/en-us/library/ms951089.aspx) – LCJ Apr 03 '14 at 11:39
  • 1
    It depends on how heavy the processing is. The web page has to be really big to cause any lags. Are you experiencing them? – noseratio Apr 03 '14 at 11:43
  • @Noseratio In my real scenario, the web page is heavy nad I need to navigate through all the elemenst and do heavy processing. Certain times it causes the application to be non-responsive. – LCJ Apr 03 '14 at 11:46

1 Answers1

1

Firstly, note that WebBrowser.DocumentText is static, it holds the original content without any DOM/AJAX changes. To get the actual current HTML, do this on your background thread:

var html = hiddenWebBrowser.Document.GetElementsByTagName("html")[0].OuterHtml;

Then you can can update another instance of WebBrowser on the UI thread:

mainForm.BeginInvoke(new Action(() => mainForm.webBrowser.DocumentText = html));

Note, BeginInvoke is asynchronous, and so is the DocumentText assignment. The DocumentCompleted event will be fired for mainForm.webBrowser when the HTML has loaded.

noseratio
  • 59,932
  • 34
  • 208
  • 486
  • 1
    @Lijo, one more thought about this. You referenced HTML Agility Pack in your question. If you use it to process the page, and use `WebBrowser` only to load the page, you *really* don't need a background thread for loading. Load it on the main UI thread, then offload the HTML Agility Pack processing to a background thread. – noseratio Apr 03 '14 at 19:17