1

I'm facing a problem that I unable to get all the product data as the website using a lazy load on product catalog page. meaning it needs to scroll until the whole page loaded.

I getting only first-page products data.

1 Answers1

2

First, you should keep in mind that there are infinite ways that infinite scroll can be implemented. Sometimes you have to click buttons on the way or do any sort of transitions. I will cover only the most simple use-case here which is scrolling down with some interval and finishing when no new products are loaded.

  1. If you build your own actor using Apify SDK, you can use infiniteScroll helper utility function. If it doesn't cover your use-case, ideally please give us feedback on Github.

  2. If you are using generic Scrapers (Web Scraper or Puppeteer Scraper), the infinite scroll functionality is not currently built-in (but maybe if you read this in the future). On the other hand, it is not that complicated to implement it yourself, let me show you a simple solution for Web Scraper's pageFunction.

async function pageFunction(context) {
    // few utilities
    const { request, log, jQuery } = context;
    const $ = jQuery;

    // Here we define the infinite scroll function, it has to be defined inside pageFunction
    const infiniteScroll = async (maxTime) => {
        const startedAt = Date.now();
        let itemCount = $('.my-class').length; // Update the selector
        while (true) {
            log.info(`INFINITE SCROLL --- ${itemCount} items loaded --- ${request.url}`)
            // timeout to prevent infinite loop
            if (Date.now() - startedAt > maxTime) {
                return;
            }
            scrollBy(0, 9999);
            await context.waitFor(5000); // This can be any number that works for your website
            const currentItemCount = $('.my-class').length; // Update the selector

            // We check if the number of items changed after the scroll, if not we finish
            if (itemCount === currentItemCount) {
                return;
            }
            itemCount = currentItemCount;
        }
    }

    // Generally, you want to do the scrolling only on the category type page
    if (request.userData.label === 'CATEGORY') {
        await infiniteScroll(60000); // Let's try 60 seconds max

        // ... Add your logic for categories
    } else {
        // Any logic for other types of pages
    }
}

Of course, this is a really trivial example. Sometimes it can get much more complicated. I even once used Puppeteer to navigate my mouse directly and drag some scroll bar that was accessible programmatically.

Lukáš Křivka
  • 953
  • 6
  • 9
  • I think itemCount = currentItemCount; This line is overriding the value of the original count that we require. so I think we can remove this line? – Kunal Khandol Jul 31 '19 at 13:22
  • No, this is intentional. Note that `const currentItemCount = $('.my-class').length;` is actually called after the scroll. So it is checking item count before the scroll vs item count after the scroll. – Lukáš Křivka Aug 01 '19 at 15:40