0

I am using puppeteer to scrape website. But classes continue to come back as empty even though I can see the many that are there. Any advice for this?

I am looking for classes of "portal-type-person". there are about 90 on the page. but all objects are empty.

const axios = require('axios');
const cheerio = require('cheerio');
const puppeteer = require('puppeteer');
const mainurl = "https://www.fbi.gov/wanted/kidnap";


(async () => {
    //const browser = await puppeteer.launch({headless: false});
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
  
    await page.goto(mainurl);
    await page.evaluate(() => {
        window.scrollBy(0, document.body.scrollHeight);
      });
    await page.waitForTimeout(1000);

    let persons = await page.evaluate(() => {
        return document.querySelectorAll('.portal-type-person');
        //return document.querySelector('.portal-type-person');
    });

    //console.log(persons);
    for(let data in persons) {
        console.log(persons[data]);
    }
  
    browser.close();
  })();
vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
JaySnel
  • 163
  • 2
  • 12

1 Answers1

0

Unfortunately, page.evaluate() can only transfer serializable values (roughly, the values JSON can handle). As document.querySelectorAll() returns collection of DOM elements that are not serializable (they contain methods and circular references), each element in the collection is replaced with an empty object. You need to return either serializable value (for example, an array of hrefs) or use something like page.$$(selector) and ElementHandle API.

vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
  • 2
    page.$$(selector) didnt work for me(could have been doing something wrong). But in that research, it led me back to page.evaluate() and I got this to work `let persons = await page.evaluate(() => Array.from(document.querySelectorAll('.portal-type-person > .title'), element => element.textContent));` – JaySnel Oct 29 '20 at 17:50