I'd like to use Chromote to gather the response body of the XHR calls made by a website, but I find the API a bit complex to master, especially the async pipeline.
I guess I need to first enable the Network functionality and then load the page (this can do), but then I need to:
- list all XHR calls
- filter them by recognizing patterns in the request URL
- access the request body of the selected sources
Can someone provide any guidance or tutorial material on this regard?
UPDATE:
Ok, I switched to package crrri
and made a general function for the purpose. The only missing part is some logic to decide when to close the connection and return the results:
get_website_resources <- function(url, url_filter = '*', type_filter = '*') {
library(crrri)
library(dplyr)
library(stringr)
library(jsonlite)
library(magrittr)
chrome <- Chrome$new()
out <- new.env()
out$l <- list()
client <- chrome$connect(callback = ~ NULL)
Fetch <- client$Fetch
Page <- client$Page
Fetch$enable(patterns = list(list(urlPattern="*", requestStage="Response"))) %...>% {
Fetch$requestPaused(callback = function(params) {
if (str_detect(params$request$url, url_filter) & str_detect(params$resourceType, type_filter)) {
Fetch$getResponseBody(requestId = params$requestId) %...>% {
resp <- .
if (resp$body != '') {
if (resp$base64Encoded) resp$body = base64_dec(resp$body) %>% rawToChar()
body <- list(list(
url = params$request$url,
response = resp
)) %>% set_names(params$requestId)
str(body)
out$l <- append(out$l, body)
}
}
}
Fetch$continueRequest(requestId = params$requestId)
})
} %...>% {
Page$navigate(url)
}
out$l
}