12

I am new to R and would like to seek some advice.

I am trying to download multiple url links (pdf format, not html) and save it into pdf file format using R.

The links I have are in character (took from the html code of the website).

I tried using download.file() function, but this requires specific url link (Written in R script) and therefore can only download 1 link for 1 file. However I have many url links, and would like to get help in doing this.

Thank you.

poppp
  • 331
  • 2
  • 3
  • 14
  • 3
    Hello. Please read here how to make [a helpful example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in R. It would be good to see what you've tried and where you're getting stuck. – LJW Aug 24 '15 at 04:15

2 Answers2

11

I believe what you are trying to do is download a list of URLs, you could try something like this approach:

  1. Store all the links in a vector using c(), ej:
urls <- c("http://link1", "http://link2", "http://link3")
  1. Iterate through the file and download each file:
for (url in urls) {
    download.file(url, destfile = basename(url))
}

If you're using Linux/Mac and https you may need to specify method and extra attributes for download.file:

download.file(url, destfile = basename(url), method="curl", extra="-k")

If you want, you can test my proof of concept here: https://gist.github.com/erickthered/7664ec514b0e820a64c8

Hope it helps!

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
erickthered
  • 901
  • 8
  • 14
  • thank you so much :) May i know how to rename the name of files? I tried using pdf.name <- paste (Date, "-", Heading, sep =" ") before the line, and change to destfile = pdf.name but it cannot work for renaming the name. – poppp Aug 25 '15 at 08:49
  • Hi, the basic idea behind renaming the file is creating a new variable that will store the destination filename just before downloading. I've updated the gist https://gist.github.com/erickthered/7664ec514b0e820a64c8 in order to rename the file based on the current date and the URL file name. The key line is: newName <- paste (format(Sys.time(), "%Y%m%d%H%M"), "-", basename(url), sep =" ") – erickthered Aug 26 '15 at 13:23
  • But I already have the designated date and name for the each file. I want to change the original file name to the designated date and name that I already created. Date and Heading are the variables that I have created. – poppp Aug 27 '15 at 04:22
  • Then it should work. Just make sure there are no special chars in neither of them (e.g. ":" is a special char). – erickthered Aug 27 '15 at 13:42
  • I did pdf.name <- paste (Date, "-", Heading, sep =" ") but it only returns 1 resuslt – poppp Aug 28 '15 at 02:53
5

URL

url = c('https://cran.r-project.org/doc/manuals/r-release/R-data.pdf',
        'https://cran.r-project.org/doc/manuals/r-release/R-exts.pdf',
        'http://kenbenoit.net/pdfs/text_analysis_in_R.pdf')

Designated names

names = c('manual1',
          'manual2',
          'manual3')

Iterate through the file and download each file with corresponding name:

for (i in 1:length(url)){
    download.file(url[i], destfile =  names[i], mode = 'wb')
}
Mutyalama
  • 51
  • 1
  • 2