3

I am using puppeteer to scrape some images off a site along with some other data. To change images I need to hover over a list item. I keep coming across documentation around .hover() but have had no success. However, .click() works perfectly for another part of my scrape.

const pptr = require('puppeteer');

async function scrapeProduct(productID) {
    const browser = await pptr.launch();
    const page = await browser.newPage();
    await page.goto(`https://someplace.com`);
    let scrapeData = await page.evaluate(async () => {
        let productMap = [];

        //scrape other data...

        const imageItems = document.querySelectorAll('ul[class="images-view-list"] > li > div');
        for (let image of imageItems) {
            await image.hover();
            productMap.push({
                'Image Src': document.querySelector('div[class="image-view-magnifier-wrap"] > img').getAttribute('src'),
            });
        }
        return productMap;
    });
    await browser.close();
    return scrapeData;
}

I've seen solutions where you evaluate a page with executing the hover prior. This is inconvenient as I collect many other data points and would like to keep my solution clean in one evaluate request. Am I understanding .hover() incorrectly?

Maksym Moros
  • 499
  • 7
  • 21

1 Answers1

5

You're mixing up Puppeteer functions with evaluated functions that execute in the DOM context. If you want to use Puppeteer hover, then you need to use image references from page.$$ query:

let productMap = [];
const page = await browser.newPage();
await page.goto(`https://someplace.com`);
//get a collection of Puppeteer element handles
const imageItems = await page.$$('ul[class="images-view-list"] > li > div');
for (let image of imageItems) {
    //hover on each element handle
    await image.hover();
    //use elementHandle getProperty method to get the current src
    productMap.push({'Image Src': await (await image.getProperty('src')).jsonValue()});
}

If you need to do it in the page.evaluate function, you will need to mimic the hover by using normal DOM mouse events.

The reason the click() method appears to work is that it is available in both contexts, as a native DOM method and a Puppeteer element handle method.

Tom
  • 1,447
  • 1
  • 12
  • 26
  • 1
    Beware that `image.getProperty('src')` ruturns a promise for JSHandle. To get the src string, you need a bit longer way: `{'Image Src': await (await image.getProperty('src')).jsonValue()}`. – vsemozhebuty Sep 05 '20 at 07:58
  • this was incredible informative. Thank you. I was noticing some strange behaviour in the evaluate block. When returning data from evaluate, how can I manipulate it in node.js? I return an array of product maps. But when I try to manipulate the data, i get [Object]'s everywhere, – Maksym Moros Sep 05 '20 at 20:05
  • Same type of problem. See https://stackoverflow.com/questions/53032903/get-elements-from-page-evaluate-in-puppeteer You're trying to pass data from the DOM context across to the Puppeteer / node context and this can only be a serializable value. So you need to return `JSON.stringify(array)` and `JSON.parse(returnValue)` to work with it. – Tom Sep 06 '20 at 07:05