-2

These sequences of actions work with Thread.Sleep, somewhere in 1 second, somewhere in 2 seconds. I think using Thread.Sleep/Task.Delay is not good. Because it can be performed differently on different computers. How do I execute these sequences without using Thread.Sleep? Or it is OK to using Thread.Sleep/Task.Delay?

        private async void ButtonFind_Click(object sender, EventArgs e)
        {

            //Action1
            string jsScript1 = "document.getElementById('story').value=" + '\'' + textFind.Text + '\'';
            await chrome.EvaluateScriptAsync(jsScript1);

            //Action2
            string jsScript2 = "document.querySelector('body > div.wrapper > div.header > div.header44 > div.search_panel > span > form > button').click();";
            await chrome.EvaluateScriptAsync(jsScript2);

            //Action3
            Thread.Sleep(1000); //it is necessary to set exactly 1 seconds
            string jsScript3 = "document.getElementsByTagName('a')[2].click();";
            await chrome.EvaluateScriptAsync(jsScript3);

            //Action4
            Thread.Sleep(2000); //it is necessary to set exactly 2 seconds
            string jsScript4 = "document.querySelector('#dle-content > div.section > ul > li:nth-child(3)').click();";
            await chrome.EvaluateScriptAsync(jsScript4);
        }

I tried to use task expectations, but it didn't help me

...
var task4 = chrome.EvaluateScriptAsync(jsScript4);
task4.Wait();

I also tried to use DOM rendering expectations, which didn't help either

            string jsScript4 = @"
                  if( document.readyState !== 'loading' ) {
                      myInitCode();
                  } else {
                      document.addEventListener('DOMContentLoaded', function () {
                          myInitCode();
                      });
                  }

                  function myInitCode() {
                   var a = document.querySelector('#dle-content > div.section > ul > li:nth-child(3)').click();
                  return a;
                  }
              ";
            
            chrome.EvaluateScriptAsync(jsScript4);

My addition (21.04.2022)


In third action instead of using Thread.Sleep, im using "While" loop Here the algorithm is correct, but for some reason, after pressing the application button, the application is hanging

                bool test = false;
                while(test == false)
                {
                    string myScript = @"
                        (function(){
                            var x = document.getElementsByTagName('a')[1].outerText;
                            return x;
                        })();
                        ";
                    var task = chrome.EvaluateScriptAsync(myScript);
                    task.ContinueWith(x =>
                    {
                        if (!x.IsFaulted)
                        {
                            var response = x.Result;
                            if (response.Success == true)
                            {
                                var final = response.Result;
                                if (final.ToString() == textFind.Text)
                                {
                                    MessageBox.Show("You found the link");
                                    test = true;
                                }
                                else
                                {
                                    MessageBox.Show("You do not found the link");
                                }
                            }
                        }
                    }, TaskScheduler.FromCurrentSynchronizationContext());
                }

My addition (23.04.2022)


string jsScript1 = "document.getElementById('story').value=" + '\'' + textFind.Text + '\'' + ";"
                + @"
    Promise.resolve()
  .then(() => document.querySelector('body > div.wrapper > div.header > div.header44 > div.search_panel > span > form > button').click())
  .then(() =>  { var target = document.body;
            const config = { 
                childList: true, 
                attributes: true, 
                characterData: true, 
                subtree: true, 
                attributeFilter: ['id'], 
                attributeOldValue: true, 
                characterDataOldValue: true 
            }
            const callback = function(mutations) 
            {
                document.addEventListener('DOMContentLoaded', function(){                    
                if(document.getElementsByTagName('a')[1].innerText=='Troy')
                    {
                        alert('I got that link');
                    }
                }, true);
            };
            const observer = new MutationObserver(callback);
            observer.observe(target, config)});
            ";

            var task1 = chrome.EvaluateScriptAsPromiseAsync(jsScript1);
            task1.Wait();

Using a MutationObserver wrapped in a promise, using EvaluateScriptAsPromiseAsync to evaluate promise. Also didnt help. I came to the conclusion that JavaScript does not save the code when clicking on a search button or after going to another page. How do I save the JavaScript code/request and continue it after clicking on a search button or after going to another page?

  • Firstly as your not returning a result I'd suggest wrapping your code in an IIFE and executing a single script. https://developer.mozilla.org/en-US/docs/Glossary/IIFE Secondly it's unlikely the behaviour you are seeing is CefSharp specific, it sounds like the site has some sort of animation/transition that takes a second to complete. I'm speculating as you've not provided actual details. To detect DOM changes you can use a https://developer.mozilla.org/en-US/docs/Web/API/MutationObserver – amaitland Apr 18 '22 at 19:54
  • amaitland, Sir thank you for your feedback, please read my question here https://stackoverflow.com/questions/71921532/javascript-waiting-for-load-page-and-click – Vlad Yorkyee Apr 19 '22 at 13:03
  • amaitland, Sir i added new addition above. Please read – Vlad Yorkyee Apr 21 '22 at 08:03
  • Not surprised your code crashes, your calling EvaluateScriptAsync in a tight loop, your code doesn't wait for the previous execution to finish before calling again. Use a mutationobserver wrapped in a promise, use EvaluateScriptAsPromiseAsync to evaluate promise. Polling isn't recommended. – amaitland Apr 21 '22 at 08:20
  • amaitland, Sir i added another new addition above. Please read – Vlad Yorkyee Apr 23 '22 at 05:12
  • Did you use your preferred search engine and lookup mutationobserver promise? Should be lots of examples out there. You need to create the mutationobserver before triggering the action then resolve the promise when you get what you are looking for. – amaitland Apr 23 '22 at 08:46
  • Something like https://blog.frankmtaylor.com/2017/06/16/promising-a-mutation-using-mutationobserver-and-promises-together/ looks like an ok guide. Comes with an example https://gist.github.com/paceaux/8f8d5d1a57409c5b5f5f2519ceb8ac83 – amaitland Apr 23 '22 at 08:50
  • Rereading your last update, if your code trigger a navigation then you'll need to wait for the page to load before attempting to access the element, wasn't clear to me before. See https://stackoverflow.com/a/71978317/4583726 – amaitland Apr 23 '22 at 19:24

2 Answers2

0

You never must work with sleep because time changes between computers and, even in the same computer, a web page may be differ the time required to load.

I work a lot with scraping and IMO the best focus to manage this is working from JavaScript side. You inject/run your JavaScript to fill controls, click buttons...

With this focus, the problem is that navigations make you lose the state. When you navigate to other page, your JavaScript start from scratch. I revolve this sharing data to persist between JavaScript and C# through Bound Object and injecting JavaScript.

For example, you can run action 1, 2 and 3 with a piece of JavaScript code. Before click button, you can use your Bound Object to tell to your C# code that you are going to second page.

When your second page are loaded, you run your JavaScript for your second page (you know the step and can inject the JavaScript code for your 2 page).

In all cases, your JavaScript code must have some mechanism to wait. For example, set a timer to wait until your controls appears. In this way, you can run your JavaScript without wait to the page is fully loaded (sometimes this events are hard to manage).

UPDATE

My scraping library is huge. I'm going to expose pieces that you need to do the work but you need to assemble by yourself.

We create a BoundObject class:

public class BoundObject
{
    public BoundObject(IWebBrowser browser)
    {
        this.Browser = browser;
    }

    public void OnJavaScriptMessage(string message)
    {
        this.Browser.OnJavaScriptMessage(message);
    }
}

IWebBrowser is an interface of my custom browser, a wrapper to manage all I need. Create a Browser class, like CustomBrowser, for example, implementing this interface.

Create a method to ensure your Bound Object is working:

public void SetBoundObject()
{
    // To get events in C# from JavaScript
    try
    {
        var boundObject = new BoundObject();
        this._browserInternal.JavascriptObjectRepository.Register(
        "bound", boundObject, false, BindingOptions.DefaultBinder);

       this.BoundObject = boundObject;
   }
   catch (ArgumentException ex)
   {
       if (!ex.ParamName.Identical("bound"))
       {
           throw;
       }
   }
}

_browserInternal is the CefSharp browser. You must run that method on each page load, when you navigate. Doing that, you have a window.bound object in JavaScript side with an onJavaScriptMessage function. Then, you can define a function in JavaScript like this:

function sendMessage(msg) {
    var json = JSON.stringify(msg);
    window.bound.onJavaScriptMessage(json);
    return this;
};

You can send now any object to your C# application and manage in your CustomBrowser, on OnJavaScriptMessage method. In that method I manage my custom message protocol, like a typical one in sockets environment or the windows message system and generate a OnMessage that I implement in classes inheriting CustomBrowser.

Send information to JavaScript is trivial using ExecuteScriptAsync of CefSharp browser.

Going further

When I work in an intense scraping job. I create some scripts with classes to manage the entire Web to scrap. I create classes, for example, to do login, navigate to different sections, fill forms... like if I was the owner of the WebSite. Then, when page load, I inject my scripts and I can use my own classes in the remote WebSite making scraping... piece of cake.

My scripts are embedded resources so are into my final executable. In debug, I read them from disk to allow edit+reload+test until my scripts works fine. With the DevTools you can try in the console until you get the desired source. Then you add into your JavaScripts classes and reload.

You can add simple JavaScript with ExecuteScriptAsync, but with large files appears problems escaping quotes...

So you need insert an entire script file. To do that, implement ISchemeHandlerFactory to create and return an IResourceHandler. That resource handler must have a ProcessRequestAsync in which you receive a request.Url that you can use to locale your scripts:

  this.ResponseLength = stream.Length;
  this.MimeType = GetMimeType(fileExtension);
  this.StatusCode = (int)HttpStatusCode.OK;
  this.Stream = stream;

  callback.Continue();
  return true;

stream maybe a MemoryStream in which you write the content of your script file.

Victor
  • 2,313
  • 2
  • 5
  • 13
  • Could you show some example code? – Vlad Yorkyee Apr 26 '22 at 12:22
  • 1
    For simple message passing scenarios I'd suggest using CefSharp.PostMessage instead of binding an object. See https://github.com/cefsharp/CefSharp/wiki/Frequently-asked-questions#JSEvent for a basic example. You can use an IJavascriptCallback for two way communication. https://github.com/cefsharp/CefSharp/issues/2775#issuecomment-498454221 – amaitland Apr 26 '22 at 19:34
  • It's true! For something simple, @amaitland comment is the best option. – Victor Apr 26 '22 at 20:09
  • Could you please help me how to realize my code by using CefSharp.PostMessage? – Vlad Yorkyee Apr 29 '22 at 10:00
  • @VladYorkyee Check this https://stackoverflow.com/a/71364204/18452174 – Victor Apr 29 '22 at 10:19
0

As your JavaScript causes a navigation you need to wait for the new page to load.

You can use something like the following to wait for the page load.

// create a static class for the extension method 
public static Task<LoadUrlAsyncResponse> WaitForLoadAsync(this IWebBrowser browser)
{
    var tcs = new TaskCompletionSource<LoadUrlAsyncResponse>(TaskCreationOptions.RunContinuationsAsynchronously);

    EventHandler<LoadErrorEventArgs> loadErrorHandler = null;
    EventHandler<LoadingStateChangedEventArgs> loadingStateChangeHandler = null;

    loadErrorHandler = (sender, args) =>
    {
        //Actions that trigger a download will raise an aborted error.
        //Generally speaking Aborted is safe to ignore
        if (args.ErrorCode == CefErrorCode.Aborted)
        {
            return;
        }

        //If LoadError was called then we'll remove both our handlers
        //as we won't need to capture LoadingStateChanged, we know there
        //was an error
        browser.LoadError -= loadErrorHandler;
        browser.LoadingStateChanged -= loadingStateChangeHandler;

        tcs.TrySetResult(new LoadUrlAsyncResponse(args.ErrorCode, -1));
    };

    loadingStateChangeHandler = (sender, args) =>
    {
        //Wait for while page to finish loading not just the first frame
        if (!args.IsLoading)
        {
            browser.LoadError -= loadErrorHandler;
            browser.LoadingStateChanged -= loadingStateChangeHandler;
            var host = args.Browser.GetHost();

            var navEntry = host?.GetVisibleNavigationEntry();

            int statusCode = navEntry?.HttpStatusCode ?? -1;

            //By default 0 is some sort of error, we map that to -1
            //so that it's clearer that something failed.
            if (statusCode == 0)
            {
                statusCode = -1;
            }

            tcs.TrySetResult(new LoadUrlAsyncResponse(statusCode == -1 ? CefErrorCode.Failed : CefErrorCode.None, statusCode));
        }
    };

    browser.LoadingStateChanged += loadingStateChangeHandler;
    browser.LoadError += loadErrorHandler;

    return tcs.Task;
}

// usage example 
private async void ButtonFind_Click(object sender, EventArgs e)
{

    //Action1
    string jsScript1 = "document.getElementById('story').value=" + '\'' + textFind.Text + '\'';
    await chrome.EvaluateScriptAsync(jsScript1);

    //Action2
    string jsScript2 = "document.querySelector('body > div.wrapper > div.header > div.header44 > div.search_panel > span > form > button').click();";
   
    await Task.WhenAll(chrome.WaitForLoadAsync(), 
      chrome.EvaluateScriptAsync(jsScript2));

    //Action3
    string jsScript3 = "document.getElementsByTagName('a')[2].click();";
    await Task.WhenAll(chrome.WaitForLoadAsync(), 
      chrome.EvaluateScriptAsync(jsScript3));


    //Action4
    string jsScript4 = "document.querySelector('#dle-content > div.section > ul > li:nth-child(3)').click();";
    await chrome.EvaluateScriptAsync(jsScript4);
}
amaitland
  • 4,073
  • 3
  • 25
  • 63
  • Version 101 include a method called WaitForNavigationAsync which can be used instead of adding a custom extension method. https://github.com/cefsharp/CefSharp/commit/98cb4cfc140ee6f527cfff4877d0a1a741f414dc – amaitland May 12 '22 at 20:10