1

Hi I am using the crrri R package to build a scraper, I have an asynchronous function to dump the DOM but i keep getting blanks even though I have the Page$loadEventFired() in the loop.

Tried to create runtime event that will wait until I get the elements from the page but I get puzzled in the promises and cannot get it to work. any ideas?

the wait function


  if(try<10 & Runtime$evaluate(expression = paste0('document.getElementsByClassName("',element_class,'").length'))%...>%.$result%...>%.$value%...>%.>0 ){
    Sys.sleep(0.5)
    wait_for_element(Runtime,element_class,try+1)
  }

the whole code

  crrri::perform_with_chrome(extra_args = c('--blink-settings=imagesEnabled=false'),function(client) {
    Network <- client$Network
    Page <- client$Page
    Runtime <- client$Runtime
    Network$enable() %...>% {
      Page$enable()
    } %...>% {
      Network$setCacheDisabled(cacheDisabled = TRUE)
    } %...>% {
      Page$navigate(url = url)
    } %...>% {
      Page$loadEventFired()
    } %...>% {
      Sys.sleep(1)
    } %...>% {
      Runtime$evaluate(
        expression = 'document.documentElement.outerHTML'
      )
    } %...>% (function(result) {
      html <- result$result$value
      #cat(html, "\n")
    })
  })
}```
Giuliano
  • 11
  • 1

1 Answers1

1

Instead of Sys.sleep, that part of the code should look something like.

                    } %...>% {
                        Page$loadEventFired()
                    } %>% wait(delay = 1) %...>% {
                        Runtime$evaluate(
                            expression = 'document.documentElement.outerHTML'
                        )

giocomai
  • 3,043
  • 21
  • 24