I have this for
loop in an R
script:
url <- "https://example.com"
page <- html_session(url, config(ssl_verifypeer = FALSE))
links <- page %>%
html_nodes("td") %>%
html_nodes("tr") %>%
html_nodes("a") %>%
html_attr("href")
base_names <- page %>%
html_nodes("td") %>%
html_nodes("tr") %>%
html_nodes("a") %>%
html_attr("href") %>%
basename()
for(i in 1:length(links)) {
site <- html_session(URLencode(
paste0("https://example.com", links[i])),
config(ssl_verifypeer = FALSE))
writeBin(site$response$content, base_names[i])
}
This loops through links, & downloads a text file to my working directory. I'm wondering if I can put return
somewhere, so that it returns the document.
Reason being, is that I'm executing my script in NiFi (using ExecuteProcess
), and it's not sending my scraped documents down the line. Instead, it just shows the head of my R script. I would assume you would wrap the for
loop in a fun <- function(x) {}
, but I'm not sure how to integrate the x
into an already working scraper.
I need it to return documents down the flow, and not just this:
Processor config:
Even if you are not familiar with NiFi, it would be a great help on the R part! Thanks