0

I am loading a page, intercepting its requests, and when a certain element shows up I stop loading and extract the data I need...

Here is the problem that I am faced with.

When simplified the code looks actually something like this:

async function loadPage()
    {
        var contentLoaded = false;
        var content;
        
        //now i say when element shows up, do something
        // it is page.waitForSelector but for simplicity, i use a timeout
        // because the problem is the same
        //i set it to "show up" in 10 seconds here.
        //when it shows up, it sets the content to 100 (extracts the content i want)
        //and stores it..
        
        setTimeout(()=>{
            content = 100;
            contentLoaded = true;
        },10000)



        //Here i have a function that loads the page
        //Intercepts request and handles them
        //Until content is loaded
        
        page.on('request', req =>{
             if(!contentLoaded)
             {
                 // keep loading page
             }
          })  
       

        // this is the piece of code i would like to not run,
        // UNTIL i either get the data, or a timeout error
        // from page.waitForSelector...
        
        //but javascript will run it if it's not busy with the
        //loading function above...
        
        // In 10 seconds the content shows 
        // and it's stored in DATA, but this piece of code has
        // already finished by the time that is done...
        // and it returns false...
        
        if(contentLoaded)
            {return content}
        else
            {return false}

    }

var x = loadPage();
x.then(console.log); //should log the data or false if error occured

Thank you all for taking the time to read this and help out, I'm a novice so any feedback or even reading material is welcome if you think there is something I'm not fully understanding

  • It's good to strip out everything except the [mcve], but here the syntax isn't valid so it's really hard to tell what your code actually is. `getData = loadPage(URL);` is missing an `await` for starters, and it's unclear whether `evaluate` is returning anything. Combining `.then` and `async`/`await` is an antipattern, the promise chain never appears to be returned in the first place, and there's an unnecessary IIFE inside a non-async func, so there are at least 3-4 different possible points of failure here. Please [edit] the post to show a runnable, reproducible example. Thanks. – ggorlen May 04 '22 at 20:46
  • @ggorlen I have edited the post and tried to strip out everything while also simplifying it more. Also by doing this I understood the problem a bit better myself... Hopefully, now the problem is clearer, thank you for taking the time to help out :) – JustBaneIsFine May 05 '22 at 17:35
  • No problem and thanks for the edit! The problem is, I still can't really run the code or understand it. What site is this on? What data are you trying to get? There's only a single Puppeteer `page` call here and it doesn't do anything, so there's too much pseudocode and comments and not enough concrete substance. Usually, use `waitForSelector` or `waitForFunction` rather than intercepting requests, although that might work too. – ggorlen May 05 '22 at 17:56
  • I use the waitForSelector, but also intercept the requests in order to not load images etc.. But anyway This problem can be taken out of the puppeteer context. You can delete the page.on block altogether. The set timeout above changes the content value after some time. And the bottom code returns the content as it is now, which is undefined... Instead of returning it now, I want to return it after it has changed. After the timeout has changed it. But I want to do that without blocking javascript, so that other code can still run. – JustBaneIsFine May 05 '22 at 18:43
  • OK, well you can [promisify `setTimeout`](https://stackoverflow.com/questions/22707475/how-to-make-a-promise-from-settimeout) and `await` it before returning. But it's still possibly not a good solution to whatever you're trying to achieve. See [xy problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/233676#233676). If you're blocking for a request, Puppeteer already offers `page.waitForResponse`, so having to promisify stuff while using Puppeteer is a likely antipattern. – ggorlen May 05 '22 at 18:50
  • That might actually be one of the things I need actually.. If i create a promise that will resolve when the data has been populated, then i can return the data... But then the question becomes, how do i return that data outside... checkIfPopulated.then(()=>{return data}); This would return to the main function, but it would not store it into the variable x... I belive this might be what I'm looking for: https://stackoverflow.com/questions/14220321/how-to-return-the-response-from-an-asynchronous-call – JustBaneIsFine May 05 '22 at 19:29
  • Yes, that's the canonical thread for people who are trying to make async code sync. You can't -- once you go async, everything is a promise if you want to use the data. – ggorlen May 05 '22 at 19:41

1 Answers1

0

Solved

Simple explanation:

Here is what I was trying to accomplish:

  1. Intercept page requests so that I can decide what not to load, and speedup loading
  2. Once an element shows up on the page, i want to extract some data and return it.

I was trying to return it like this: (note, all the browser and error handling will be left out in these since it would just clutter the explanation)

var data = loadPage(url);

async function loadPage(URL)
    {
     var data;

     page.waitForSelector(
         var x = //page.evaluate returns data to x...
         data = x;
     )
    return data;
    }

Which doesn't work since return runs immediately but waitForSelector runs later, so we always return undefined...

The correct way of doing it, or rather the way it works for me is to return the whole promise, and then extract the data...

var data = loadPage(url);
data.then(//do what needs to be done with the data);  

async function loadPage(URL)
    {
    var data = page.waitForSelector(
         var x = //page.evaluate returns data to x...
         data = x;
     )
    return data; // we return data as a promise
    }

I hope it's a solid enough explanation, if someone needs to see the whole deal, I could edit the question and place the whole code there...