0

I am trying to create a basic script to just scroll down to the bottom of the hacker news site. The scrolling implementation was taken from this so question (2nd answer by kimbaudi, 1st method).

The implementation works by constantly measuring the .length of a list of elements (as provided by a selector) while scrolling, to figure out if the browser has successfully scrolled to the bottom of said list of elements.

For my selector, I chose the HTML element housing each article on hacker news, tr.athing, with the intent of scrolling down to the bottom-most article link. Instead, even though tr.athing as a selector is printable (as seen in the code below), I get the following error:

Error: Error: failed to find element matching selector "tr.athing:last-child"

What is going wrong?

const puppeteer = require("puppeteer");
const cheerio = require('cheerio');

const link = 'https://news.ycombinator.com/';

// 2 functions used in scrolling
async function getCount(page) {
  await console.log(page.$$eval("tr.athing", a => a.length));
  return await page.$$eval("tr.athing", a => a.length);
}

async function scrollDown(page) {
  await page.$eval("tr.athing:last-child", e => {
    e.scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
  });
}


// puppeteer usage as normal
puppeteer.launch({ headless: false }).then(async browser => {

  const page = await browser.newPage();
  const navigationPromise = page.waitForNavigation();
  await page.setViewport({ width: 1500, height: 800 });

  // Loading page
  await page.goto(link);
  await navigationPromise;
  await page.waitFor(1000);

  // Using cheerio to inject jquery into page.
  const html = await page.content();
  const $ = await cheerio.load(html);

  // This works
  var selection = $('tr.athing').text();

  await console.log('\n');
  await console.log(selection);
  await console.log('\n');

  // Error, this does not work for some reason;
  // scrolling code starts here.
  const delay = 10000;
  let preCount = 0;
  let postCount = 0;

  do {
    preCount = getCount(page);
    scrollDown(page);
   page.waitFor(delay);
    postCount = getCount(page);
  } while (postCount > preCount);
      page.waitFor(delay);


//  await browser.close();

})
halfer
  • 19,824
  • 17
  • 99
  • 186
Coolio2654
  • 1,589
  • 3
  • 21
  • 46

1 Answers1

0

The last-child selector won't get you the last element but the last element of its parent.

The :last-child selector matches every element that is the last child of its parent.

You could do something like this instead:

async function scrollDown(page) {
  await page.$$eval("tr.athing", els => {
    els[els.length -1].scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
  });
}

Also notice that you have many missing awaits in your code

do {
    preCount = await getCount(page);
    await scrollDown(page);
    await page.waitFor(delay);
    postCount = await getCount(page);
} while (postCount > preCount);
    await page.waitFor(delay);
hardkoded
  • 18,915
  • 3
  • 52
  • 64
  • Thanks for at least clearing up my understanding of that method, that `last-child` takes the `last` sibling, essentially, hardkoded! However, having inputted your suggestions, I now get a different error in my app (the previous one assumedly having been vanquished): `UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'scrollIntoView' of undefined`. Would you also help me with this one? – Coolio2654 Jul 15 '19 at 17:18
  • @Coolio2654 maybe in the first loop you have no records yet. You could add some `if(els.length)` before calling scrollIntoView. – hardkoded Jul 15 '19 at 18:39
  • Ok, I solved the error that was in my code before, and now it runs! Thank you very much. Could you tell me why using `:last-child` originally was not working? It seems to me that should have still caused a scroll event to occur, even if it wasn't to the last instance of the selector (as originally intended), but to some other element. – Coolio2654 Jul 15 '19 at 20:29
  • It doesn't work because there are not TR with a class `athing` which is the `last-child` if its parent. If you see the HTML the last child of the parent (a TBODY) is a TR with no class. If the answer helped you an upvote will be appreciated :) – hardkoded Jul 15 '19 at 21:30