1

I have developed a simple Id checking windows forms with C# application to check a set of given Ids valid or not by passing to a webpage using webbrowser control and getting the reply and everything is working fine,its taking 40 - 60 seconds for 20 Ids.one by one.Now i want to speed up the same process using advance threading concept in C# .

Code is working fine i want to improve the performance using threading. any simple suggestion would be great help today

private void button2_Click(object sender, EventArgs e)
       {
           string url = "https://idscheckingsite.com";
           WebBrowser wb = new WebBrowser();
           wb.ScriptErrorsSuppressed = true;
           wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(Final_DocumentCompleted);
           wb.Navigate(url);

       }

private void Final_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
       {
           WebBrowser wbs = sender as WebBrowser;
           wbs.Document.GetElementById("pannumber").InnerText = ListsofIds[ids];
           wbs.Document.GetElementById("frmType1").SetAttribute("value", "24Q");
           HtmlElement btnlink = wbs.Document.GetElementById("clickGo1");
           btnlink.InvokeMember("Click");

           //string response = wbs.DocumentText;
           wbs.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(Final_DocumentCompleted);
           wbs.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(Final_result);
       }


private void Final_result(object sender, WebBrowserDocumentCompletedEventArgs e)
       {

           WebBrowser wbResult = sender as WebBrowser;

           string status = wbResult.Document.GetElementById("status").InnerText;
           string name = wbResult.Document.GetElementById("name").InnerText;

           wbResult.DocumentCompleted -= new WebBrowserDocumentCompletedEventHandler(Final_result);
           wbResult.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(Final_DocumentCompleted);

           DataRow dr = dt.NewRow();

           dr[0] = PANNumber[ids];
           dr[1] = status;
           dr[2] = name;

           dt.Rows.Add(dr);
           ++ids;

           if (ids < 20)
               wbResult.Navigate(vurl);
           else
           {
               dataGridView1.DataSource = dt;
           }
       }

Working fine but need to improve the performance to the max using advance C# threading concepts if any .

Pavel Anikhouski
  • 21,776
  • 12
  • 51
  • 66
  • 1
    Possible duplicate of [WebBrowser Control in a new thread](https://stackoverflow.com/questions/4269800/webbrowser-control-in-a-new-thread) – Pavel Anikhouski Jul 18 '19 at 10:28
  • No here in my application same URL but N number of Ids to check same procedure for me i wanna improve the performance parallelly with n number of ids ..so that ex 10000 ids can be processed in a short time .. – Indexonindia Jul 18 '19 at 10:44
  • The `WebBrowser` control downloads pages asynchronously, so you could create more than one running concurrently in the UI thread. Is it mandatory to use multithreading? Multithreading is difficult, and is full of traps and caveats! – Theodor Zoulias Jul 18 '19 at 11:34
  • Anything option other than multi threading will also do .. i am new to windows application ... – Indexonindia Jul 18 '19 at 12:18

1 Answers1

0

Here is my suggestion. When the button2 is clicked, a number of worker tasks are started. A reasonable number is 4, but you can try different numbers until you get the best performance. Each worker task will use its own WebBrowser control, and will invoke a subset of the ids. For example the worker task #0 will invoke the ids 4, 8, 12, 16, and 20, the worker task #1 will invoke 1, 5, 9, 13, and 17 etc. Then all worker tasks will be waited to complete, and then the DataGridView can be updated. There is no multithreading involved. Everything happens in the UI thread. No locking or other thread synchronization is required.

private async void button2_Click(object sender, EventArgs e)
{
    string url = "https://idscheckingsite.com";
    const int WORKER_TASKS_COUNT = 4;
    var workerTasks = new Task[WORKER_TASKS_COUNT];
    for (int i = 0; i < WORKER_TASKS_COUNT; i++)
    {
        workerTasks[i] = DoWorkAsync(i);
    }
    await Task.WhenAll(workerTasks);
    dataGridView1.DataSource = dt;

    async Task DoWorkAsync(int workerIndex)
    {
        using (var wb = new WebBrowser())
        {
            wb.ScriptErrorsSuppressed = true;
            for (int i = 0; i < ListsofIds.Length; i++)
            {
                if (i % WORKER_TASKS_COUNT != workerIndex) continue;
                wb.Navigate(url);
                await wb; // await for the next DocumentCompleted
                wb.Document.GetElementById("pannumber").InnerText = ListsofIds[i];
                wb.Document.GetElementById("frmType1").SetAttribute("value", "24Q");
                HtmlElement btnlink = wb.Document.GetElementById("clickGo1");
                btnlink.InvokeMember("Click");
                await wb; // await for the next DocumentCompleted
                string status = wb.Document.GetElementById("status").InnerText;
                string name = wb.Document.GetElementById("name").InnerText;
                DataRow dr = dt.NewRow();
                dr[0] = PANNumber[i];
                dr[1] = status;
                dr[2] = name;
                dt.Rows.Add(dr);
            }
        }
    }
}

The code above uses an interesting technique to simplify the navigation of the WebBrowser control. Instead of subscribing and unsubscribing manually to the DocumentCompleted event, it is doing it automatically by awaiting the WebBrowser control. Normally this is not possible, but we can make it possible by creating an extension method that returns a TaskAwaiter:

public static class WebBrowserExtensions
{
    public static TaskAwaiter<Uri> GetAwaiter(this WebBrowser wb)
    {
        var tcs = new TaskCompletionSource<Uri>();
        WebBrowserDocumentCompletedEventHandler handler = null;
        handler = (_, e) =>
        {
            wb.DocumentCompleted -= handler;
            tcs.TrySetResult(e.Url);
        };
        wb.DocumentCompleted += handler;
        return tcs.Task.GetAwaiter();
    }
}

Update: After using my code myself I found await wb to be a bit confusing, because the WebBrowser control has many events that could be awaited. So I made it more explicit and extensible be creating an async version of the event (instead of an awaiter):

public static class WebBrowserExtensions
{
    public static Task<Uri> DocumentCompletedAsync(this WebBrowser wb)
    {
        var tcs = new TaskCompletionSource<Uri>();
        WebBrowserDocumentCompletedEventHandler handler = null;
        handler = (_, e) =>
        {
            wb.DocumentCompleted -= handler;
            tcs.TrySetResult(e.Url);
        };
        wb.DocumentCompleted += handler;
        return tcs.Task;
    }
}

It can be used like this:

await wb.DocumentCompletedAsync();

Then it becomes trivial to create more extension methods like NavigatedAsync or DocumentTitleChangedAsync for example.


Update: Waiting endlessly is not very nice, so a timeout (expressed in milliseconds) could be added as an argument in the awaited extension method. Since the whole code is intended to run exclusively in the UI thread I used a System.Windows.Forms.Timer, although a CancellationToken would be propably more convenient in general. The code is a bit involved to avoid memory leaks, that could be an issue for an application intended to run for many hours, and do thousands web requests.

public static class WebBrowserExtensions
{
    public static Task<Uri> DocumentCompletedAsync(this WebBrowser wb, int timeout)
    {
        var tcs = new TaskCompletionSource<Uri>();
        WebBrowserDocumentCompletedEventHandler handler = null;
        var timeoutRegistration = WithTimeout(tcs, timeout,
            () => wb.DocumentCompleted -= handler);
        handler = (_, e) =>
        {
            wb.DocumentCompleted -= handler;
            timeoutRegistration.Unregister();
            tcs.TrySetResult(e.Url);
        };
        wb.DocumentCompleted += handler;
        return tcs.Task;
    }
    public static Task<Uri> DocumentCompletedAsync(this WebBrowser wb)
    {
        return wb.DocumentCompletedAsync(30000); // Default timeout 30 sec
    }

    private static TimeoutRegistration WithTimeout<T>(
        TaskCompletionSource<T> tcs, int timeout, Action eventRemove)
    {
        if (timeout == Timeout.Infinite) return default;
        var timer = new System.Windows.Forms.Timer();
        timer.Tick += (s, e) =>
        {
            timer.Enabled = false;
            timer = null;
            eventRemove();
            eventRemove = null;
            tcs.SetException(new TimeoutException());
            tcs = null;
        };
        timer.Interval = timeout;
        timer.Enabled = true;
        return new TimeoutRegistration(() =>
        {
            if (timer == null) return;
            timer.Enabled = false;
            // Make everything null to avoid memory leaks
            timer = null;
            eventRemove = null;
            tcs = null;
        });
    }

    private struct TimeoutRegistration
    {
        private Action _unregister;
        public TimeoutRegistration(Action unregister)
        {
            _unregister = unregister;
        }
        public void Unregister()
        {
            if (_unregister == null) return;
            _unregister();
            _unregister = null;
        }
    }

}

Update: As a side note, I see that you are suppressing script errors by using wb.ScriptErrorsSuppressed = true. Are you aware that you can configure the Internet Explorer version emulated by the WebBrowser control? To make the control emulate the latest (and final) version of Internet Explorer, the version 11, add this code at the start of your program:

Registry.SetValue(@"HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BROWSER_EMULATION",
    AppDomain.CurrentDomain.FriendlyName, 11000); // IE11
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • Hello Zoulias thanks a lot for the reply ..however , i am facing couple of issues after implementing your code ..await wb; is making me wait endlessly to see the result in the grid . and await Task.WhenAll(workerTasks); can be used with in the async method only error . Please help me to get rid of this issue . thanks - Martin – Indexonindia Jul 22 '19 at 00:40
  • @Indexonindia you need to add the `async` keyword in the event handler. See first line of code in my example. Keeping the UI responsive is actually mandatory for this code to work, because everything is happening in the UI thread. So no blocking code (like `Task.WaitAll` or `Thread.Sleep`) can be allowed. – Theodor Zoulias Jul 22 '19 at 06:40
  • Also the second `await` will never return if the `clickGo1` invokes an AJAX request, because in this case the `DocumentCompleted` will not fire. So you must `await` with `Task.Delay` until the results become available. For example: `while (wb.Document.GetElementById("status") == null) await Task.Delay(100);`. Or simply `await Task.Delay(5000)`, if you are not sure what condition to put inside the `while`. – Theodor Zoulias Jul 22 '19 at 06:49
  • Thank you so much for your code and detailed explanation . it works like a charm .if (timeout == Timeout.Infinite) return default; type expected error is coming at this line please help me to solve this error as i am new to this windows application and was working more in Web Application.. Thanks a lot once again – Indexonindia Jul 23 '19 at 06:19
  • @Indexonindia if you are using C# 7.0 or earlier you must change it to `return default(TimeoutRegistration);`. To use the literal [`default`](https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/statements-expressions-operators/default-value-expressions#default-literal-and-type-inference) you need to upgrade to C# 7.1. – Theodor Zoulias Jul 23 '19 at 06:42
  • ok Thank you so much . One more issue i am facing .. the initial process of he automation will start with login page with CAPTCHA . so i am using one login button onclick i am loading the login page in a webbrowser1 control . and manually login after that button click event that i am starting the verify page whch actually using your code to scrap the data .. thank please .. sorry to bother you much . thank a lot once again . – Indexonindia Jul 23 '19 at 13:47
  • @Indexonindia no problem, I am glad to help because this is a topic I am interested too. My suggestion creates a number of `WebBrowser` controls (typically 4), that are not attached to a `Form`, so they are not visible. Is this the problem? Do you want to attach these controls in a form so that you can see the UI and manually login on each of them? – Theodor Zoulias Jul 23 '19 at 14:23
  • thanks for the reply once again . I am using one webbrowser control to login that is a visible one and once logged in the session will be maintained while other webbrowsers will use the same session and scrap the data . but in the visible browser session time out message is coming. i want to refresh the page on a 10 mins interval thats all . and your code is working fine and i am going to set a more than 1000 ids and scrap let me see how its gonna work . if any issues come on , i will update you . thanks – Indexonindia Jul 23 '19 at 18:15
  • I see. Refreshing the visible WebBrowser1 every 10 min should be fairly easy. Just add a `Τimer` and on the event `Tick` call `WebBrowser1.Refresh()`. It is interesting that all controls share the same session, didn't know that. I wonder if this was still true in case that each control was running in a different thread... – Theodor Zoulias Jul 23 '19 at 19:47
  • Yes Still its working fine with one UI control, webbrowser1, login ,but for WORKER_TASKS_COUNT =4 loop is stopping and returning that object reference is not set to an object instance error . if you want i will send you the all he project coding and excel input value with site login credential just for a clear test on your side . please if ok please send me your email id i will post the project with code . thanks.so that we can close this issue as soon as possible . please sir it would be a great help . - Martin Robert – Indexonindia Jul 24 '19 at 06:13
  • @Indexonindia Hi Martin. This site does not allow personal messages. I login frequently to this site: //lichess.org. You could create an account and send a message to the user [Skeftomilos](https://lichess.org/inbox/new?user=Skeftomilos). – Theodor Zoulias Jul 24 '19 at 07:38
  • ohhh when i register its saying that cross origin request is forbidden . please any other option - thanks – Indexonindia Jul 24 '19 at 13:48
  • user id indexonindia is in facebook – Indexonindia Jul 24 '19 at 13:54
  • i have sent the message now in lichess.org sir . from melbenmartin – Indexonindia Jul 24 '19 at 17:21