15

I'm trying to achieve something very trivial: Get a list of elements, and then do something with the innerText of each element.

const tweets = await page.$$('.tweet');

From what I can tell, this returns a nodelist, just like the document.querySelectorAll() method in the browser.

How do I just loop over it and get what I need? I tried various stuff, like:

[...tweets].forEach(tweet => {
  console.log(tweet.innerText)
});
Grant Miller
  • 27,532
  • 16
  • 147
  • 165
i.brod
  • 3,993
  • 11
  • 38
  • 74
  • before trying `forEach`, did you tried `for` loop, like `for(var i = 0; i< tweet.length; i++){ console.log(tweet[i].innerText)}` – Akhil Aravind Oct 16 '18 at 03:50

2 Answers2

40

page.$$():

You can use a combination of elementHandle.getProperty() and jsHandle.jsonValue() to obtain the innerText from an ElementHandle obtained with page.$$():

const tweets = await page.$$('.tweet');

for (let i = 0; i < tweets.length; i++) {
  const tweet = await (await tweets[i].getProperty('innerText')).jsonValue();
  console.log(tweet);
}

If you are set on using the forEach() method, you can wrap the loop in a promise:

const tweets = await page.$$('.tweet');

await new Promise((resolve, reject) => {
  tweets.forEach(async (tweet, i) => {
    tweet = await (await tweet.getProperty('innerText')).jsonValue();
    console.log(tweet);
    if (i === tweets.length - 1) {
      resolve();
    }
  });
});

page.evaluate():

Alternatively, you can skip using page.$$() entirely, and use page.evaluate():

const tweets = await page.evaluate(() => Array.from(document.getElementsByClassName('tweet'), e => e.innerText));

tweets.forEach(tweet => {
  console.log(tweet);
});
Grant Miller
  • 27,532
  • 16
  • 147
  • 165
  • 4
    I really like the last example. I wasn't even aware of the Array.from() method. – i.brod Oct 16 '18 at 11:04
  • Does exist any possibility to get rid of 'for' loop and use some array transformation to build array of innerText values for sample with `page$$`? – algot Apr 15 '19 at 22:48
  • +1 been using arrays for ages.. never knew about Array.from! so much better than writing Array.prototype.slice.call! – 4UmNinja Oct 14 '19 at 01:07
  • The page.$$ version just looks awful, I just don't understand the use case for it? – Omiron May 27 '20 at 00:34
15

According to puppeteer docs here, $$ Does not return a nodelist, instead it returns a Promise of Array of ElementHandle. It's way different then a NodeList.

There are several ways to solve the problem.

1. Using built-in function for loops called page.$$eval

This method runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.

So to get innerText is like following,

// Find all .tweet, and return innerText for each element, in a array.
const tweets = await page.$$eval('.tweet', element => element.innerText);

2. Pass the elementHandle to the page.evaluate

Whatever you get from await page.$$('.tweet') is an array of elementHandle. If you console, it will say JShandle or ElementHandle depending on the type.

Forget the hard explanation, it's easier to demonstrate.

// let's just call them tweetHandle 
const tweetHandles = await page.$$('.tweet');

// loop thru all handles
for(const tweethandle of tweetHandles){

   // pass the single handle below
   const singleTweet = await page.evaluate(el => el.innerText, tweethandle)

   // do whatever you want with the data
   console.log(singleTweet) 
}

Of course there are multiple ways to solve this problem, Grant Miller also answered few of them in the other answer.

Md. Abu Taher
  • 17,395
  • 5
  • 49
  • 73
  • Yes, i did more or less what u demonstrated in the second example. Boy does Puppeteer seem overly complicated...I truly wonder how come its mechanics are so different to what we're used to in the browser. I mean, isn't it just a "virtual browser"? – i.brod Oct 16 '18 at 11:08
  • 1
    It is a browser, but the thing is, telling a human to use a browser is a lot different than telling a computer. :D I thought I put the simplest example available for your problem. Sigh. – Md. Abu Taher Oct 16 '18 at 12:00
  • For `.$$eval`, the second argument callback is not applied to each element, it's applied to all elements. So `page.$$eval('.tweet', elements => elements.map(e => e.innerText))` – ggorlen Dec 28 '20 at 05:34