I am trying to get out product data from a website that loads the product list as the user scrolls down. I am using Apify for this. My first thought was to see if somebody had already solved this and I found 2 useful links: How to make the Apify Crawler to scroll full page when web page have infinite scrolling? and How to scrape dynamic-loading listing and individual pages using Apify?. However, when I tried to apply the functions they mention, my Apify crawler failed to load the content.
I am using a web-scraper based on the code in the basic web-scraper repository.
The website I am trying to get data out of is in this link. For the moment I am just learning so I just want to be able to get the data out of this one page, I do not need to navigate to other pages.
The PageFunction I am using is the following:
async function pageFunction(context) {
// Establishing uility constants to use throughout the code
const { request, log, skipLinks } = context;
const $ = context.jQuery;
const pageTitle = $('title').first().text();
context.log.info('Wait for website to render')
await context.waitFor(2000)
//Creating function to scroll the page til the bottom
const infiniteScroll = async (maxTime) => {
const startedAt = Date.now();
let itemCount = $('.upcName').length;
for (;;) {
log.info(`INFINITE SCROLL --- ${itemCount} initial items loaded ---`);
// timeout to prevent infinite loop
if (Date.now() - startedAt > maxTime) {
return;
}
scrollBy(0, 99999);
await context.waitFor(1000);
const currentItemCount = $('.upcName').length;
log.info(`INFINITE SCROLL --- ${currentItemCount} items loaded after scroll ---`);
if (itemCount === currentItemCount) {
return;
}
itemCount = currentItemCount;
}
};
context.log.info('Initiating scrolling function');
await infiniteScroll(60000);
context.log.info(`Scraping URL: ${context.request.url}`);
var results = []
$(".itemGrid").each(function() {
results.push({
name: $(this).find('.upcName').text(),
product_url: $(this).find('.nombreProductoDisplay').attr('href'),
image_url: $(this).find('.lazyload').attr('data-original'),
description: $(this).find('.block-with-text').text(),
price: $(this).find('.upcPrice').text()
});
});
return results
}
I replaced the while(true){...}
loop for a for(;;){...}
because I was getting a Unexpected constant condition. (no-constant-condition)ESLint
error.
Also, I have tried varying the magnitude of the scroll and the await periods.
In spite of all this, I cannot seem to get the crawler to get me more than 32 results.
Could someone please explain to me what am i doing wrong?
################ UPDATE ################## I continued to work on this and could not make it work from the Apify platform so my original question still stands. However, I did manage to make the scroll function work by running the script from my pc.