0

I'm trying to get some links from a website with the help of puppeteer.

async function getLinks(){

    
    const first = 'first';
    const last = 'last';

    const browser = await browserControl.startBrowser();
    const page = await browser.newPage();

    await page.goto(url_baseStats);
    // await page.waitForNavigation();

    let links = await page.evaluate((first, last) => {
        try {
            let links = Array.from(document.querySelectorAll('a'), a => a.getAttribute('href'));
            links = links.slice(links.indexOf(first), links.indexOf(last));
            return links;
        } catch(err) {
            throw err;
        }
    });
    
    console.log("links:", links);
    return links;
    
}

I have 2 question's:

  1. When I'm running the debugger he gets to the "await page.evaluate((..."-point and then jumps straight up to the "console.log(...". Why he doesn't wait?

  2. Why I need to pass the variables first & last as parameters to the evaluation-function? I defined them above, they should be in the scope of the evaluation function?!?

Thanks in advance ;_)

Mohammad Yaser Ahmadi
  • 4,664
  • 3
  • 17
  • 39
  • It's time for more debugging: make your function return a value _after_ your catch, too, so that you can check whether it actually just immediately errored out. As for why you need to pass first and last: [you don't](https://pptr.dev/#?product=Puppeteer&version=v7.0.1&show=api-pageevaluatepagefunction-args), unless you want values from your "not running in the browser" context to get passed into the browser context, then you do. In this case, it looks like you forgot to set `evaluate(..., first, last)` so that those two values make it into your function. – Mike 'Pomax' Kamermans Feb 08 '21 at 16:41
  • 2
    Apparently (according to the docs) `page.evaluate()` waits only if the function callback returns a Promise. (The documentation is not super-clear however.) – Pointy Feb 08 '21 at 16:46
  • I don't know why `first` and `last` are there either; parameters to the `.evaluate()` callback should be passed after the callback function in the `.evaluate()` call itself. Why are they there? What values did you expect them to have? *edit* oh wait; yes take the arguments `first, last` out of the callback function. As it is, they'll be `undefined` so the callback won't work as you expect. – Pointy Feb 08 '21 at 16:51
  • (And in this case waiting for a Promise wouldn't make any difference, because it looks like that callback is synchronous anyway.) – Pointy Feb 08 '21 at 16:51
  • Ok 1. the array.slice parameters where wrong, thus the evaluation function returned an empty array. But why the debugger doesn't stop at the breakpoints I placed into the inner evaluation function? It's the reason i couldn't see the querySelector was working, but the array.slice method was buggy. – MrGreenPepper Feb 08 '21 at 17:17
  • @Pointy thanks, but when I'm taking the variables out, I get: "ReferenceError: first is not defined" – MrGreenPepper Feb 08 '21 at 17:22
  • I even placed a console.log("test"); statement into the evaluation function. The function works, but nothing appears in the console and the breakpoints are still ignored. ... I'm using vs code and node. – MrGreenPepper Feb 08 '21 at 17:27
  • 1
    Well then you have to pass `first` and `last` as the second and third arguments to the `.evaluate()` call. (And put the parameters back.) – Pointy Feb 08 '21 at 17:28
  • 2
    The function is transmitted as text from the Node domain to the browser domain, so normal scope rules don't make sense. – Pointy Feb 08 '21 at 17:29
  • @Pointy: Hey really THANKS for your attention and time, have a nice day ;_) – MrGreenPepper Feb 08 '21 at 19:12
  • *"I have 2 questions"*: that is one too many. A question should be one question. You can post another separately if needed. – trincot Feb 08 '21 at 19:33

1 Answers1

1

Just for later viewers, summary of the comments:

Everything that is run inside the page.evaluate() function is done in the context of the browser page. The script is running in the browser not in node.js. Thus some usual js rules aren't valid here. For example:

  1. if you log anything, it will show in the browsers console, which, if you are running headless, you will not see.
  2. you also can't set a node breakpoint inside the function
  3. no usual scope valid, you need to pass parameters

;_)