0

I'm building an abstraction/simplification for Puppeteer in Nodejs, to scrape single page applications. One of the basic actions it will perform, is scrolling down a page multiple times, in order to trigger the AJAX call. what i do is basically:

(()=>{
  page.on('response', async res => {   
   if (res.includes('/someAjaxAction')) {  
   numResponses++
  }
})

  while(numResponses<20){

    await scrollDown();//Calling my function that scrolls down.

    await Promise.delay(400)//Creating a delay just in case...
}
})()

I setup the onResponse event listener, and count for the number of times, the appropriate ajax call was performed. I scolldown infinitely, until this condition is met, and that's it - I can use the complete HTML.

The problem is, that this would force the client coder to provide a number of ajax calls they anticipate. What i would like, is to somehow recognize a situation, when no more scrolling is possible. Like: we've reached the end of the page.

Any idea as to how i could abstract such a situation?

i.brod
  • 3,993
  • 11
  • 38
  • 74
  • is this working for you? `await page.waitFor( ( ) => (window.innerHeight + window.pageYOffset) >= document.body.offsetHeight - 2 )` – Eduard Jacko Oct 18 '18 at 18:38
  • Can you check this answer and update your post accordingly with all sort of codes? https://stackoverflow.com/a/52886019/6161265 – Md. Abu Taher Oct 19 '18 at 05:45

1 Answers1

1

There is no foolproof way, but I deal with infinity in this order,

  • Collect the data from target
  • Remove the target element
  • Scroll for a specific amount of time
  • Wait for the new target element to appear
  • ...loop thru until there is no content left

The easiest way to know it's finished is to cleverly use try...catch for page.waitFor function.

Md. Abu Taher
  • 17,395
  • 5
  • 49
  • 73