0

I have a nodejs script which use Puppeteer and works well. The code is basically this :

//Load href to visit into each tab
tabs = [];
var idxTab=0;
for(let hrefToVisit of hrefsToVisit) {
          tabs.push(await browser.newPage());
          await tabs[idxTab].goto(hrefToVisit, {waitUntil: 'domcontentloaded'});
          idxTab++;
}

while (true) {

  ...  

  //Refresh all tabs
  var promises = [];
  for (let tab of tabs ) {
    promises.push(tab.reload());
  }

  await Promise.all(promises);    

  try{ 

    let allNews = '';
    let firstNews = '';
    let pseudo = '';

    //Get the first news on the page
    allNews = await tab.$$("article.news");
    await tab.waitForTimeout(5000);
    firstNews = allNews[0];
    
    //Get the pseudo of this first news
    pseudo = await firstNews.$eval('.user.userInfo span', s => s.textContent.trim());
    console.log(new Date(), 'Pseudo : ', pseudo, '...');

    //Log HTML content for debug
    console.log('html when all ok : ');
    let bodyHTML = await tab.evaluate(() => document.body.innerHTML);
    console.log(bodyHTML);
    await tab.screenshot({path: 'ok.png'});
  }catch (err){
    console.log('html when not ok : ');
    let bodyHTML = await tab.evaluate(() => document.body.innerHTML);
    console.log(bodyHTML);
    await tab.screenshot({path: 'ko.png'});
    console.log(err);
  }

  ...

}

The problem is, I have 30% of time this error :

TypeError: Cannot read properties of undefined (reading '$eval')
    at main (script.js:327:40)

Sometimes it runs ok, sometimes don't.

So I logged html when it's ok and when it's not. I also made a screenshot.

Both HTML and screenshot are the same, the page is loaded and the elements are present in the html, so why I can't "eval" them ?

The part :

await tab.waitForTimeout(5000);

make the things a little better sometimes, but not enough. I tried to use other strategy, like using :

await waitForSelector('article.news');

Before evaluate anything, but it doesn't change anything, it's even worth because it doesn't wait a lot (the element is quickly displayed so it passed true). Waiting 5s each time seems a little better, but even if I put 10s, sometimes it failed anyway...

How can I debug this, to find the problem ?

user2178964
  • 124
  • 6
  • 16
  • 40
  • This will happen if the `allNews` array is empty. You should check `allNews.length` before the code that uses `firstNews = allNews[0]` – Barmar Sep 29 '22 at 15:15
  • I well understand the error, but there is no reason the allNews is empty, that's the problem. And when I put the complete HTML into logs, and compare when I have an error and when I've not, it's the same HTML ! So the result should be the same. – user2178964 Sep 29 '22 at 15:31
  • [Instead](https://serpapi.com/blog/puppeteer-antipatterns/#overusing-waitfortimeout) of `await tab.waitForTimeout(5000);` use `page.waitForSelector` to protect all of your element accesses. `waitForTimeout` is so poor practice that it was removed from the API. But you need to put `await` in front of your `waitForSelector` calls, e.g.`await page.waitForSelector('article.news')`. Anyway, this isn't a [mcve] because there's no site, so it's hard to diagnose other than to point out the poor practices here. "Sometimes it runs ok, sometimes don't." means you have a race condition. – ggorlen Sep 29 '22 at 15:39
  • See also [this answer](https://stackoverflow.com/a/73676564/6243352). – ggorlen Sep 29 '22 at 15:43
  • Thanks for the answer, I already tried the waitForSelector thing like I said in my post (I should have put the "await" in my post, but yes, I used it when I tried in my code). I think the "race" (concurrent ?) is a good lead, because I used tabs instead of multiple browser for each page. I do this to avoir consume too much resource and share cookies (before scraping I do some common treatment and I logged in on the website). – user2178964 Sep 29 '22 at 15:54
  • Multiple tabs ("pages") in one browser shouldn't lead to a race condition. Another issue problem is many global variables in this code. Please use eslint with full strength and put `const` in front of all variable declarations. For example, `allNews = await tab.$$("article.news");` should be `const allNews = await tab.$$("article.news");`. You create a `promises` array but never `await` that either. Basically, you need to `await` all async code or you'll wind up with race conditions (unless you intend for things to run concurrently). `tab.$$("article.news");` isn't waiting for the reload. – ggorlen Sep 29 '22 at 16:39
  • My bad, sorry, I didn't well recopy my code into this post. I just edited it. I had already a "let" declaration for all variables (const is not possible because value can change in each loop), and I already had a "await" statement for promises to reload. Any other lead ? I installed eslint, but it doesn't see any obvious lead here. Thx – user2178964 Sep 29 '22 at 20:15
  • Seems to work using destructuring for first element and waitForSelector, I fixed some others little things that ESLint told me, weird, but seems to work, maybe it's random, but it crashed like 1/50s today, and now no error in 5min, so... – user2178964 Sep 29 '22 at 20:35

1 Answers1

0

Weirdly, after fixing some little thing with ESLINT help (thx to ggorlen), it seems to work better.

I used "destructuring" to get the first news, and put back the "await tab.waitForSelector..." instead of waiting 5000ms :

let firstNews;
let otherNews;
let pseudo = '';
...
await tab.waitForSelector('article.news', { timeout: 5000 });
[firstNews, ...otherNews] = await tab.$$('article.news');
user2178964
  • 124
  • 6
  • 16
  • 40