iterate over a list of url in r

Question

i want to import a txt that have a list of urls and extract from each one and save that in a cvs file but i get stuck

First i import the txt no problem but when a i want to iterate over each row i just extrat from the first one

library(rvest)
library(tidyr)
library(dplyr)

for(i in seq(list_url)) {
    text <- read_html(list_url$url[i]) %>%html_nodes("tr~ tr+ tr strong") %>%html_text()}

i just get the result from the first url in a value form , i want a dataframe of all the the extract from the urls

edit : the list_ url file is full with this urls..

http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=_b8I7G9olKAukGNlsRE6RHSYaYPu8YLjhTEW15HEuj4. http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=ewwF4WmHAnOkCg8Y_XIFH705H_O5hJL9uB5hztOhrsE. http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=Z9BDo7ACNDbsUwTiVFTe9aKFfcLAxxnU2AtL6DCloX4. http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=NZPRa9SoKHVJQcZ64_4zVgcLSNKmHZ4MtorPu23MUPg.

Could you please provide the data in `list_url`. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for help — Harrison Jones, Sep 27 '21 at 13:58
You overwrite your `text` object in each pass, not sure how one would expect differently. Try `out <- do.call(rbind, lapply(list_url, function(url) html_text(html_nodes(read_html(url), "..."))))` — r2evans, Sep 27 '21 at 13:58
please provide the list as is, not links to the urls. You should use `dput(list_url)`. Paste the output of dput in your question. — GuedesBF, Sep 27 '21 at 22:26

score 0 · Answer 1 · answered Sep 27 '21 at 14:04

Are you sure it is the result of the first URL you get in the text variable? It should be the last as with every cycle the for loop overwrites the value in text.

lapply() is perfect for this and avoids the issues that come with for-loops.

This does what you are trying to achieve:

text <- 
  lapply(list_url$url,
         \(x) read_html(x) %>% 
           html_nodes("tr~ tr+ tr strong") %>% 
           html_text())

Using sapply() instead you'll get a vector as a result instead of a list. Which might be helpful for the following steps. You might also want to look up purrr, it provides a suite of *apply() like functions.

thank you so much but now i have this error.... Error in open.connection(x, "rb") : Empty reply from server — nicolas hernandez, Sep 27 '21 at 14:44

score 0 · Answer 2 · answered Sep 27 '21 at 17:49

0

You should create an output object, then populate every element "i" of that output object with your function. As is, your code is just overwriting all the intermediate objects to the same output object.

library(rvest)
library(tidyr)
library(dplyr)

text<-vector('list', length=length(list_url)) #create the output object
for(i in seq(list_url)) {
    text[[i]] <- read_html(list_url$url[i]) %>%html_nodes("tr~ tr+ tr strong") %>%html_text()}
text

answered Sep 27 '21 at 17:49

GuedesBF

8,409
5
19
37

thank you but it returns only the first url – nicolas hernandez Sep 27 '21 at 19:00
I is hard to get it right without a proper reproducible example. Maybe if you provide a sample of list_url we can help you more – GuedesBF Sep 27 '21 at 20:05
yes i edit it. thank you – nicolas hernandez Sep 27 '21 at 22:23
please provide the list with `dput(list_url)` – GuedesBF Sep 27 '21 at 22:26

iterate over a list of url in r

2 Answers2