2

I am trying to download several files using web scraping. My code is the following:

library(XML);library(stringr)
url11="http://web.stanford.edu/~jurafsky/slp3/"
links = getHTMLLinks(url11)
links_data = links[str_detect(links, ".pptx")]
links_data
for(i in seq_along(links_data)){
a = NULL
a[i] = basename( links_data[i])
download.file(url11, destfile = a[i])
}

The code runs and it downloads all the files, in this case 13 files. However, all the files have the same size, despite the fact that they have different sizes in reality (I can check that if I download the files manually one by one). In addition, all the files are corrupt, I cannot open them using Power Point in this case. I'm using Windows 10, R version 3.4.1 64-bit, and RStudio Version 1.0.143. I would appreciate your help. Thanks in advance.

Sergio
  • 109
  • 1
  • 9
  • You are giving the wrong url. It should be `download.file(paste0(url11,links_data[i]), destfile = a[i])`. – A Gore Jul 26 '17 at 19:01
  • Thanks a lot! @AGore your answer really solved one part of the problem. I mean the fact that the files were all the same size. However, I still cannot open them as I get a message from Power Point saying they are damaged or corrupt. I tried different methods `method="wb` or "libcurl" "wininet" but it does not improve. – Sergio Jul 26 '17 at 22:11
  • 4
    It is working now. I checked this [link] (https://stackoverflow.com/questions/36252851/pdf-download-is-blank-using-r). I realized that I have to add `mode = "wb"` at the end of download.file. Thanks for your support again. – Sergio Jul 27 '17 at 00:06

0 Answers0