1

I have many thousands of Internet addresses and I have to verify if each url still exists. In R (3.1.2 on Ubuntu 14.04) I use the url.exists function of the RCurl package and everything is fine. But there is a single case in which the RCurl remains stucked in a loop forever:

url.exists("www.iisgrottaminarda.it")

I have tried the same command on a Mac with the same version of R and the result is

FALSE

Following a suggestion by @thomas I tried to check the status of the website with the httr package:

http_status(GET("www.iisgrottaminarda.it"))

and the result is:

$category [1] "success"

$message [1] "success: (200) OK"

This is a bit odd...

I also tried to use the followLocation parameter to block redirects (as suggested always by @thomas) but without luck.

How I can fix this problem? Thank you

Community
  • 1
  • 1
Gianluca78
  • 794
  • 1
  • 9
  • 24
  • You can set a timeout to abandon checking whether the URL exists. See, for example, http://stackoverflow.com/questions/6733748/how-to-stop-execution-of-rcurlgeturl-if-it-is-taking-too-long – Thomas May 13 '15 at 11:11
  • Thank you for the workaround. I discovered that the problem appears when the website redirects somewhere else... Maybe I should try a way to say to RCurl to not follow redirects,,, – Gianluca78 May 13 '15 at 11:55
  • You can use the `follow.location=0L` option to disallow that. You may also want to explore httr, which is a wrapper package for RCurl (here's a related question: http://stackoverflow.com/questions/23139357/how-to-determine-if-a-url-object-in-r-base-package-returns-404-not-found). – Thomas May 13 '15 at 11:57
  • Thank you, I have edited my question according to your suggestions. – Gianluca78 May 13 '15 at 12:21

0 Answers0