0

i want to import a txt that have a list of urls and extract from each one and save that in a cvs file but i get stuck

First i import the txt no problem but when a i want to iterate over each row i just extrat from the first one

library(rvest)
library(tidyr)
library(dplyr)

for(i in seq(list_url)) {
    text <- read_html(list_url$url[i]) %>%html_nodes("tr~ tr+ tr strong") %>%html_text()}

i just get the result from the first url in a value form , i want a dataframe of all the the extract from the urls

edit : the list_ url file is full with this urls..

http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=_b8I7G9olKAukGNlsRE6RHSYaYPu8YLjhTEW15HEuj4. http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=ewwF4WmHAnOkCg8Y_XIFH705H_O5hJL9uB5hztOhrsE. http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=Z9BDo7ACNDbsUwTiVFTe9aKFfcLAxxnU2AtL6DCloX4. http://consultas.pjn.gov.ar/cuantificacion/civil/vida_po_detalle_caso.php?numcas=NZPRa9SoKHVJQcZ64_4zVgcLSNKmHZ4MtorPu23MUPg.

  • Could you please provide the data in `list_url`. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for help – Harrison Jones Sep 27 '21 at 13:58
  • You overwrite your `text` object in each pass, not sure how one would expect differently. Try `out <- do.call(rbind, lapply(list_url, function(url) html_text(html_nodes(read_html(url), "..."))))` – r2evans Sep 27 '21 at 13:58
  • i just put examples with the data from list_url – nicolas hernandez Sep 27 '21 at 22:24
  • please provide the list as is, not links to the urls. You should use `dput(list_url)`. Paste the output of dput in your question. – GuedesBF Sep 27 '21 at 22:26

2 Answers2

0

Are you sure it is the result of the first URL you get in the text variable? It should be the last as with every cycle the for loop overwrites the value in text.

lapply() is perfect for this and avoids the issues that come with for-loops.

This does what you are trying to achieve:

text <- 
  lapply(list_url$url,
         \(x) read_html(x) %>% 
           html_nodes("tr~ tr+ tr strong") %>% 
           html_text())

Using sapply() instead you'll get a vector as a result instead of a list. Which might be helpful for the following steps. You might also want to look up purrr, it provides a suite of *apply() like functions.

Till
  • 3,845
  • 1
  • 11
  • 18
0

You should create an output object, then populate every element "i" of that output object with your function. As is, your code is just overwriting all the intermediate objects to the same output object.

library(rvest)
library(tidyr)
library(dplyr)

text<-vector('list', length=length(list_url)) #create the output object
for(i in seq(list_url)) {
    text[[i]] <- read_html(list_url$url[i]) %>%html_nodes("tr~ tr+ tr strong") %>%html_text()}
text
GuedesBF
  • 8,409
  • 5
  • 19
  • 37