0

I am trying to scrape this web https://poe.ninja/challenge/builds?time-machine=day-6 using Puppeteer. I tried Puppeteer page.evaluate querySelectorAll return empty objects and saw lot of similar question here. but none of them solve my problem.

Here is my code:

const scrapeNinja = async () => {
    const browser = await puppeteer.launch({headless: false})

    const page = await browser.newPage()

    await page.goto(`https://poe.ninja/challenge/builds?time-machine=day-6`, {
        waitUntil: 'domcontentloaded',
    })

    const getArray = await page.evaluate(() => {
        return Array.from(document.querySelectorAll(
                '#openSidebar > div > section:nth-child(3) > div > div > div > ul li .css-1h2ruwl'
            )).map(e => e.textContent)
    })

    console.log(getArray)
}

I know the values returned from page.evaluate should be serializeable. isn't this Array.from(document.querySelectorAll('#openSidebar > div > section:nth-child(3) > div > div > div > ul li .css-1h2ruwl')).map(e => e.textContent) not a serializeable value? I tried use this on the dev tool section it return exacully what i want, but back to node.js, it only return empty array...

Am i do something wrong?

001
  • 27
  • 6
  • Are you sure your selector works? Try it in the page on the browser console. – code May 11 '22 at 20:09
  • yes im sure it works. I tired it on console it gave me what i want – 001 May 11 '22 at 20:11
  • 1
    Perhaps the server detected an automated request and blocked your attempt. Try taking a screenshot of the page during the request and see what you get. – code May 11 '22 at 20:17
  • god i feeling so dumb... thx for your advise ,the screenshot shows that im not fullyloaded the page, after i add timeout up to 3 sec, it return what i want – 001 May 11 '22 at 20:27
  • Glad you got it. Maybe you should `waitUntil` `networkidle2`, but I'm not sure. – code May 11 '22 at 20:36

1 Answers1

1

Looks like the problem is really with waiting, you are looking for elements even if full dom content isnt fully loaded.

  const scrapeNinja = async () => {
  const browser = await puppeteer.launch({headless: false})

  const page = await browser.newPage()

  await page.goto(`https://poe.ninja/challenge/builds?time-machine=day-6`, {
    waitUntil: 'networkidle2',
  })
  

  const getArray = await page.$$eval('#openSidebar > div > section:nth-child(3) > div > div > div > ul li .css-1h2ruwl',
    el => el.map(item => item.textContent))

  console.log(getArray)
}

scrapeNinja()

This code works perfectly for me, even you dont have to initialize array. In the future use networkidle2 in waitUntil option

  • Thanks! this makes my code more clear! Btw the `page.$$eval` is doing same thing with `page.evaluate`? what's the different between them? – 001 May 13 '22 at 12:42
  • 1
    Results are same, more complex answer is here: https://stackoverflow.com/questions/55664420/page-evaluate-vs-puppeteer-methods so maybe for the beginning you can use `page.evaluate` because of better debugging option – Daniel Hudač May 13 '22 at 13:06