100

I am in a situation where new content is created when I scroll down. The new content has a specific class name.

How can I keep scrolling down until all the elements have loaded?

In other words, I want to reach the stage where if I keep scrolling down, nothing new will load.

I was using code to scroll down, coupled with an

await page.waitForSelector('.class_name');

The problem with this approach is that after all the elements have loaded, the code keeps on scrolling down, no new elements are created and eventually I get a timeout error.

This is the code:

await page.evaluate( () => {
  window.scrollBy(0, window.innerHeight);
});
await page.waitForSelector('.class_name');
ggorlen
  • 44,755
  • 7
  • 76
  • 106
user1584421
  • 3,499
  • 11
  • 46
  • 86
  • 1
    It sounds like there might be an issue with the code you use to scroll down. Can you please add that to your question? – Grant Miller Jul 26 '18 at 03:23
  • `if i keep scrolling down, nothing new will load` Define "nothing new will load" and check for that in your code. Also timeouts can be redefined. But yes, Grant Miller is right, please provide your code and, ideally, thet target site URL. – Vaviloff Jul 26 '18 at 08:28
  • Thanks a lot! I upadated the code. Since it is a local site, i cannot post a URL though... 'Nothing new will load' means the website has loaded all the available elements, and so, when i keep scrolling down and using page.waitForSelector(), no new elements will appear, and my code waits indefinetely, until it throws a timeout error. – user1584421 Jul 26 '18 at 09:57
  • 5
    you could try this `await page.evaluate('window.scrollTo(0, document.body.scrollHeight)')` – Ondrej Kvasnovsky Oct 16 '18 at 17:53

11 Answers11

164

Give this a shot:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        headless: false
    });
    const page = await browser.newPage();
    await page.goto('https://www.yoursite.com');
    await page.setViewport({
        width: 1200,
        height: 800
    });

    await autoScroll(page);

    await page.screenshot({
        path: 'yoursite.png',
        fullPage: true
    });

    await browser.close();
})();

async function autoScroll(page){
    await page.evaluate(async () => {
        await new Promise((resolve) => {
            var totalHeight = 0;
            var distance = 100;
            var timer = setInterval(() => {
                var scrollHeight = document.body.scrollHeight;
                window.scrollBy(0, distance);
                totalHeight += distance;

                if(totalHeight >= scrollHeight - window.innerHeight){
                    clearInterval(timer);
                    resolve();
                }
            }, 100);
        });
    });
}

Source: https://github.com/chenxiaochun/blog/issues/38

EDIT

added window.innerHeight to the calculation because the available scrolling distance is body height minus viewport height, not the entire body height.

EDIT 2

Sure, Dan (from comments) In order to add a counter to stop the scrolling you will need to introduce a variable that gets incremented with each iteration. When it reaches a certain value (say 50 scrolls for example), you clear the interval and resolve the promise.

Here's themodified code with a scrolling limit set to 50:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        headless: false
    });
    const page = await browser.newPage();
    await page.goto('https://www.yoursite.com');
    await page.setViewport({
        width: 1200,
        height: 800
    });

    await autoScroll(page, 50);  // set limit to 50 scrolls

    await page.screenshot({
        path: 'yoursite.png',
        fullPage: true
    });

    await browser.close();
})();

async function autoScroll(page, maxScrolls){
    await page.evaluate(async (maxScrolls) => {
        await new Promise((resolve) => {
            var totalHeight = 0;
            var distance = 100;
            var scrolls = 0;  // scrolls counter
            var timer = setInterval(() => {
                var scrollHeight = document.body.scrollHeight;
                window.scrollBy(0, distance);
                totalHeight += distance;
                scrolls++;  // increment counter

                // stop scrolling if reached the end or the maximum number of scrolls
                if(totalHeight >= scrollHeight - window.innerHeight || scrolls >= maxScrolls){
                    clearInterval(timer);
                    resolve();
                }
            }, 100);
        });
    }, maxScrolls);  // pass maxScrolls to the function
}

Cory
  • 5,645
  • 3
  • 28
  • 30
  • 7
    100); is too fast, it would just skip the whole autoscrolling , i had to use 400... is there anyway to detect an class, element appearing before stopping the autoscroll? – CodeGuru Jan 12 '19 at 15:56
  • 1
    When you're `evaluate`ing you have a reference to the document context. So you would just use a standard selector, and check it's position using `getBoundingClientRect`. – Cory Jan 14 '19 at 22:02
  • @CodeGuru it is possible to stop the autoscroll using classname but you need to use `scrollIntoView` instead of `scrollBy`, which means you need a reference to the element to scroll into that will possibly generate more content at the bottom of the page. Then, you can compare the # of classnames before scrolling into view vs after scrolling into view. If the # of classnames increases after scrolling into view, more content has been generated so you can scroll more. Otherwise, no more content has been generated so stop scrolling. Hope that makes sense. – kimbaudi Jun 17 '19 at 09:55
  • Error: Protocol error (Runtime.callFunctionOn): Target closed. Im scrolling down a page that loads new data every time you scroll down it works for a while but after some time this error pops up – Omar Aug 04 '19 at 18:51
  • This works perfect when I run it locally on my windows machine. But when I upload it to my linux vps. It does not load any more items than the initial ones. – Iqbal Sep 30 '19 at 17:35
  • 1
    lqbal: It could be related to your xvfb. Try changing `headless: false` to `headless: true` – Cory Oct 03 '19 at 00:53
  • Where is `window` defined? – Jannis Ioannou Apr 05 '21 at 23:48
  • 1
    @JannisIoannou, take a look at this [MDN](https://developer.mozilla.org/en-US/docs/Web/API/Window). `window` is a global browser object, representing the window in which the script is running. If you're referencing `window` in Node, you'll get an error. – Cory Apr 06 '21 at 17:27
  • @Cory I'm talking about node context, where puppeteer is launched from. Where is window defined in the above `node` code? – Jannis Ioannou Apr 06 '21 at 18:35
  • 2
    @JannisIoannou: To execute JavaScript code on your puppeteer instance, you use the evaluate method. Think of code running inside evaluate as if you are running it in a browser console. In this case `window` is automatically created when evaluate is called. Please take a look at the [evaluate](https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pageevaluatepagefunction-args) method for additional context. – Cory Apr 22 '21 at 17:52
  • Please add some explanation on what your code is doing (at least code comments). Posting a 40-line code snippet without comments is not helping too much – Philipp Aug 09 '21 at 16:28
  • @cory There should probably some kind of counter based mechanism to stop the scrolling. As some websites(https://www.barstoolsports.com/) scroll indefinitely. – Dan May 06 '23 at 00:23
  • @dan. Updated the code in Edit 2. – Cory Jul 25 '23 at 01:22
  • @Philipp I appreciate the feedback. Added some comments to help the new to puppeteer folks. – Cory Jul 25 '23 at 01:23
  • It worked for me, but with some adjustments. I used some specific elements in the page to verify if the page was loaded. – Guilherme Sampaio Aug 24 '23 at 14:22
42

Scrolling down to the bottom of the page can be accomplished in 2 ways:

  1. use scrollIntoView (to scroll to the part of the page that can create more content at the bottom) and selectors (i.e., document.querySelectorAll('.class_name').length to check whether more content has been generated)
  2. use scrollBy (to incrementally scroll down the page) and either setTimeout or setInterval (to incrementally check whether we are at the bottom of the page)

Here is an implementation using scrollIntoView and selector (assuming .class_name is the selector that we scroll into for more content) in plain JavaScript that we can run in the browser:

Method 1: use scrollIntoView and selectors

const delay = 3000;
const wait = (ms) => new Promise(res => setTimeout(res, ms));
const count = async () => document.querySelectorAll('.class_name').length;
const scrollDown = async () => {
  document.querySelector('.class_name:last-child')
    .scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
}

let preCount = 0;
let postCount = 0;
do {
  preCount = await count();
  await scrollDown();
  await wait(delay);
  postCount = await count();
} while (postCount > preCount);
await wait(delay);

In this method, we are comparing the # of .class_name selectors before scrolling (preCount) vs after scrolling (postCount) to check whether we are at bottom of page:

if (postCount > precount) {
  // NOT bottom of page
} else {
  // bottom of page
}

And here are 2 possible implementations using either setTimeout or setInterval with scrollBy in plain JavaScript that we can run in the browser console:

Method 2a: use setTimeout with scrollBy

const distance = 100;
const delay = 100;
while (document.scrollingElement.scrollTop + window.innerHeight < document.scrollingElement.scrollHeight) {
  document.scrollingElement.scrollBy(0, distance);
  await new Promise(resolve => { setTimeout(resolve, delay); });
}

Method 2b: use setInterval with scrollBy

const distance = 100;
const delay = 100;
const timer = setInterval(() => {
  document.scrollingElement.scrollBy(0, distance);
  if (document.scrollingElement.scrollTop + window.innerHeight >= document.scrollingElement.scrollHeight) {
    clearInterval(timer);
  }
}, delay);

In this method, we are comparing document.scrollingElement.scrollTop + window.innerHeight with document.scrollingElement.scrollHeight to check whether we are at the bottom of the page:

if (document.scrollingElement.scrollTop + window.innerHeight < document.scrollingElement.scrollHeight) {
  // NOT bottom of page
} else {
  // bottom of page
}

If either of the JavaScript code above scrolls the page all the way down to the bottom, then we know it is working and we can automate this using Puppeteer.

Here are the sample Puppeteer Node.js scripts that will scroll down to the bottom of the page and wait a few seconds before closing the browser.

Puppeteer Method 1: use scrollIntoView with selector (.class_name)

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
    args: ['--window-size=800,600']
  });
  const page = await browser.newPage();
  await page.goto('https://example.com');

  const delay = 3000;
  let preCount = 0;
  let postCount = 0;
  do {
    preCount = await getCount(page);
    await scrollDown(page);
    await page.waitFor(delay);
    postCount = await getCount(page);
  } while (postCount > preCount);
  await page.waitFor(delay);

  await browser.close();
})();

async function getCount(page) {
  return await page.$$eval('.class_name', a => a.length);
}

async function scrollDown(page) {
  await page.$eval('.class_name:last-child', e => {
    e.scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
  });
}

Puppeteer Method 2a: use setTimeout with scrollBy

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
    args: ['--window-size=800,600']
  });
  const page = await browser.newPage();
  await page.goto('https://example.com');

  await scrollToBottom(page);
  await page.waitFor(3000);

  await browser.close();
})();

async function scrollToBottom(page) {
  const distance = 100; // should be less than or equal to window.innerHeight
  const delay = 100;
  while (await page.evaluate(() => document.scrollingElement.scrollTop + window.innerHeight < document.scrollingElement.scrollHeight)) {
    await page.evaluate((y) => { document.scrollingElement.scrollBy(0, y); }, distance);
    await page.waitFor(delay);
  }
}

Puppeteer Method 2b: use setInterval with scrollBy

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
    args: ['--window-size=800,600']
  });
  const page = await browser.newPage();
  await page.goto('https://example.com');

  await page.evaluate(scrollToBottom);
  await page.waitFor(3000);

  await browser.close();
})();

async function scrollToBottom() {
  await new Promise(resolve => {
    const distance = 100; // should be less than or equal to window.innerHeight
    const delay = 100;
    const timer = setInterval(() => {
      document.scrollingElement.scrollBy(0, distance);
      if (document.scrollingElement.scrollTop + window.innerHeight >= document.scrollingElement.scrollHeight) {
        clearInterval(timer);
        resolve();
      }
    }, delay);
  });
}
kimbaudi
  • 13,655
  • 9
  • 62
  • 74
14

based on answer from this url

await page.evaluate(() => {
  window.scrollTo(0, window.document.body.scrollHeight);
});
Jonathan Lin
  • 19,922
  • 7
  • 69
  • 65
x-magix
  • 2,623
  • 15
  • 19
  • 10
    `window.innerHeight` doesn't scroll all the way to the bottom, but with `window.scrollTo(0,window.document.body.scrollHeight)` it does. – K. Frank Nov 16 '20 at 03:43
9

Much easier:

    await page.evaluate(async () => {
      let scrollPosition = 0
      let documentHeight = document.body.scrollHeight

      while (documentHeight > scrollPosition) {
        window.scrollBy(0, documentHeight)
        await new Promise(resolve => {
          setTimeout(resolve, 1000)
        })
        scrollPosition = documentHeight
        documentHeight = document.body.scrollHeight
      }
    })
llobet
  • 2,672
  • 23
  • 36
8

Many solutions here assume the page height being constant. This implementation works even if the page height changes (e.g. loading new content as user scrolls down).

await page.evaluate(() => new Promise((resolve) => {
  var scrollTop = -1;
  const interval = setInterval(() => {
    window.scrollBy(0, 100);
    if(document.documentElement.scrollTop !== scrollTop) {
      scrollTop = document.documentElement.scrollTop;
      return;
    }
    clearInterval(interval);
    resolve();
  }, 10);
}));
kimbaudi
  • 13,655
  • 9
  • 62
  • 74
nagy.zsolt.hun
  • 6,292
  • 12
  • 56
  • 95
7

A similar solution to @EdvinTr, it's giving me great results. Scrolling and comparing with the page's Y Offset, very simple.

let originalOffset = 0;
while (true) {
    await page.evaluate('window.scrollBy(0, document.body.scrollHeight)');
    await page.waitForTimeout(200);
    let newOffset = await page.evaluate('window.pageYOffset');
    if (originalOffset === newOffset) {
        break;
    }
    originalOffset = newOffset;
}
Raoul Boulos
  • 71
  • 1
  • 1
6

Pretty simple solution

let lastHeight = await page.evaluate('document.body.scrollHeight');

    while (true) {
        await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
        await page.waitForTimeout(2000); // sleep a bit
        let newHeight = await page.evaluate('document.body.scrollHeight');
        if (newHeight === lastHeight) {
            break;
        }
        lastHeight = newHeight;
    }
EdvinTr
  • 152
  • 3
  • 3
4

You might just use the following code using page.keyboard object:

await page.keyboard.press('ArrowDown');
delay(2000) //wait for 2 seconds
await page.keyboard.press('ArrowUp');
function delay(milliseconds) { //function for waiting
        return new Promise(resolve => {
          setTimeout(() => {
            resolve();
          }, milliseconds);
        });
      }
3

why not just

await page.keyboard.press("PageDown");
jsotola
  • 2,238
  • 1
  • 10
  • 22
Mark O
  • 927
  • 9
  • 13
  • That worked for me! Thanks! – Klnh13 Jun 02 '23 at 17:00
  • Glad I scrolled () all the way down here! This worked great and is nice and simple. – ericArbour Jul 17 '23 at 17:09
  • For me it didn't work, because I had a "sticky" content at the bottom. The sticky part stayed at the same position without moving all the way down, like it would normally when opening a page. What worked for me was the "scrollTo" solution. Just FYI – mariodev Aug 18 '23 at 19:39
2
await page.keyboard.down('End')

basically when executing it, the playwright will hold the End key on the keyboard, if you want you can use press and add in a loop that will have the same effect.

chrslg
  • 9,023
  • 5
  • 17
  • 31
Poker Player
  • 146
  • 1
  • 4
0

I handle scrolling with CodeceptJS (the information herein is relevant for pure Puppeteer too) and the Puppeteer web driver via I.pressKey(). To support macOS, use [‘Command’,’DownArrow’] and for other operating systems, use ‘End’. Therefore, add two calls to I.pressKey(). As mentioned previously, this might not work in a mobile browser.

This will scroll the focused area to the bottom. Focusing the correct area first is paramount. One way is to click on an element in the desired area, such as a div.

To tell whether the area has actually scrolled, either:

  1. Look for a selector if you are able to compute a selector for new elements

  2. Diff the result of await I.grabPageScrollPosition() before and after the key presses.

  3. If the frontend team is able to help you by adding an element representing “the end,” that’s your most reliable option. However, if infinite truly means infinite, this will not be possible.

What about the network I/O that’s needed to retrieve new items? How do you know when to look at the page for new items? Unfortunately, unless your test knows how many items are available (eg by calling a REST API) and how many have been downloaded, it can only make a good happy path guess. Network failures and unexpected latency will always thwart optimistic guesses.

(A) Loop three or so times with a brief wait to guarantee there are no more items.

(B) You might be able to wait for a spinner to disappear.

Terris
  • 887
  • 1
  • 10
  • 15