0

I need a hand in generating a list of urls. I try to generate the list with the following lines in R Studio:

library(RCurl)
links_list = list()
for(j in 10:46) {
    for(k in 10:99) {
       urls <- c(paste0("https://www.tbmm.gov.tr/tutanaklar/TUTANAK/TBMM/d26/c0", j, "/tbmm260", j, "0", k, ".pdf")) #check 0
    if(url.exists(urls) == TRUE)
    links_list <- c(links_list, urls)
  }
}

My aim is to skip the inexistent urls. Before using if, it worked well but the resut was a list of more than 3k urls most of which are inexistent.

I'm working on a PC with Windows but this code doesn't generate a list of urls. The list remains empty when the run is finished. I tried the same in a mac. It worked in a way but the running of the code did not stop.

I appreciate if anyone comes up with an idea!

Thanks...

markus
  • 25,843
  • 5
  • 39
  • 58

1 Answers1

0

It would be easiest if you kept track of the status of all urls so you could better understand what was happening. Try:

j <- 10:46
k <- 10:99
df <- expand.grid(j=j,k=k)
library(dplyr)
h <- getCurlHandle()
df<-df %>%
  mutate(url = paste0("https://www.tbmm.gov.tr/tutanaklar/TUTANAK/TBMM/d26/c0", j, "/tbmm260", j, "0", k, ".pdf"))%>%
  mutate(exists=sapply(url,url.exists,curl=h))  

In the above code, each row of df will have a url and track the return value of url.exists(). Then you can simply filter the ones that do using

df %>%
  filter(exists)%>%
  pull(url)

Do note that I tried this out on a few urls generated and none of them existed so that may be the issue.

Rohit
  • 1,967
  • 1
  • 12
  • 15
  • Thanks a lot for your help! As you said, df shows that none of the urls exist, but when I tried to open a couple that I'm sure, they worked and the urls opened without any problem. I don't get why all urls are shown false. – Utku G. Mar 19 '19 at 11:35
  • Can you list a valid example that doesn't work but turns up in your browser? Also, there may be an issue with your proxy settings. See : https://stackoverflow.com/questions/29130237/rcurl-url-exists-returns-false-when-url-does-exists – Rohit Mar 19 '19 at 12:04