0

I'm trying to convert a pdf to txt using pdftotxt. Keep getting an error. Would appreciate help:

dest <- getwd()

# make a vector of PDF file names
myfiles <- list.files(path = dest, pattern = "pdf",  full.names = TRUE)


lapply(myfiles, function(i) system(paste('"C:/Users/Karan       Tibrewal/Downloads/xpdfbin-win-3.04.zip/xpdfbin-win-3.04/bin32/pdftotxt.exe"', 
                                     paste0('"', i, '"')), wait = FALSE) )

I get this warning :

Warning message: running command '"C:/Users/Karan Tibrewal/Downloads/xpdfbin-win-3.04.zip/xpdfbin-win-3.04/bin64/pdftotxt.exe" "C:/Users/Karan Tibrewal/Documents/cem/12_13.pdf"' had status 127

I can't find the txt file. Whats wrong?

Karan Tibrewal
  • 375
  • 1
  • 4
  • 11
  • 1
    Maybe you can used the `readPDF()` function from the `tm` package. The function uses the programs `pdftotext` and `pdfinfo`, which need to be installed and accessible on your computer, but it provides a convenient wrapper that simplifies the extraction of text from a PDF file in R. – RHertel Jan 10 '16 at 12:02

2 Answers2

1

I think you need a separator when there is a space in the path. Something like "\" instead of \ ? between Karan and Tibrewel?

BioProgram
  • 684
  • 2
  • 13
  • 28
0

I think you are getting error because of spaces in the file path. Possible solution will be to use "Entire File Path" in double quotes. use messagebox and check whether your full path gets in double quotes.

Use this:

'"""C:/Users/Karan Tibrewal/Downloads/xpdfbin-win-3.04.zip/xpdfbin-win-3.04/bin32/pdftotxt.exe"""'

paste0('"""', i, '"""')