0

I'm trying to scrape a website, that has javascript executing, adding new information when user scrolls down. I use this function to get DOM:

library(crrri)
dump_DOM <- function(url) {
    perform_with_chrome(function(client) {
        Network <- client$Network
        Page <- client$Page
        Runtime <- client$Runtime
        Network$enable() %...>% {
            Page$enable()
        } %...>% {
            Network$setCacheDisabled(cacheDisabled = TRUE)
        } %...>% {
            Page$navigate(url = url)
        } %...>% {
            Page$loadEventFired()
        } %...>% {
            Runtime$evaluate(
                expression = 'document.documentElement.outerHTML'
            )
        } %...>% (function(result) {
            html <- result$result$value
            return(html)
        })
    },
    extra_args = '--no-sandbox')
}
website <- dump_DOM(url)

I couldn't find how to scroll the page in headless chrome, so I tried to change the window size to no avail, by adding these lines inside the function:

Emulation <- client$Emulation

Network$enable() %...>% { 
    Page$enable()
} %...>% {
    Emulation$setDeviceMetricsOverride(
        width = 1080,
        height = 10000,
        deviceScaleFactor = 0,
        mobile = FALSE,
        dontSetVisibleSize = FALSE
    )
} %...>% {
....

So the question is - how do I scroll the page down to the bottom? Alternatively, how make the 'window size' huge enough that it loads the full page without need to scroll down?

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
vladli
  • 1,454
  • 2
  • 16
  • 40
  • were you able to solve the above problem? I'm starting to use crrri and having problem understanding the basics and there aren't many R-based chrome devtool protocol tutorials – KKW Apr 03 '21 at 18:22
  • Nah, I had to use tricks to load different pages. Try looking into another package that uses Selenium. It is harder to launch but does the job of scrolling. I think it's called seleniumR – vladli Apr 04 '21 at 19:03
  • It's not possible to scroll with crrri afaik @KKW – vladli Apr 04 '21 at 19:04

0 Answers0