Problems with Downloading pdf file using R

Question

I would like to download a pdf file from the internet and save it in the local HD. After download, the pdf output file has lots of empty pages. What can I do to fix it?

Example:

require(XML)
url <- ('http://cran.r-project.org/doc/manuals/R-intro.pdf')
download.file(url, 'introductionToR.pdf')

Thanks in advance.

I copied and pasted your code and got the 109 pages document as it should be. Maybe a problem iwth your PDF viewer? — vaettchen, Feb 14 '12 at 16:23
works fine for me. (R 2.14.1, Linux -- could you post results of `sessionInfo()`? It does seem likely to be a viewer or some other OS issue, as this is pretty basic functionality ...) By the way, you don't need the `XML` package for this -- `download.file` is part of base R. — Ben Bolker, Feb 14 '12 at 16:31
PS. I'm guessing you're on Windows: `?download.file` says: "Code written to download binary files must use ‘mode = "wb"’, but the problems incurred by a text transfer will only be seen on Windows." — Ben Bolker, Feb 14 '12 at 16:33
I had the same problem as the OP. PDF downloaded would be corrupted. damn 'wb' parameter solved the problem — userJT, Mar 12 '15 at 09:30

score 49 · Accepted Answer · answered Feb 14 '12 at 16:26

49

Try with wb-mode like this:

download.file(url, 'introductionToR.pdf', mode="wb").

For me it works that way.

answered Feb 14 '12 at 16:26

Sophia

1,821
2
17
19

1

this answer saved me a great deal of work ! (on win OS) – userJT Mar 12 '15 at 09:31
5

To add an explanation, `mode="wb"` tells the function to treat the file as binary rather than text. – Matt Jun 16 '17 at 17:10

Selcuk Akbas · Answer 2 · 2018-03-06T10:52:26.207

you can download pdfs and export tables as data.frame using tabulizer package

https://ropensci.org/tutorials/tabulizer_tutorial.html

install.packages("devtools")
# on 64-bit Windows
ghit::install_github(c("ropenscilabs/tabulizerjars", "ropenscilabs/tabulizer"), INSTALL_opts = "--no-multiarch")
# elsewhere
ghit::install_github(c("ropenscilabs/tabulizerjars", "ropenscilabs/tabulizer"))

library(tabulizer)

f2 <- "https://github.com/leeper/tabulizer/raw/master/inst/examples/data.pdf"
extract_tables(f2, pages = 1, method = "data.frame")

Problems with Downloading pdf file using R

2 Answers2

Linked