1

Is there are way to convert PDF to csv within R?

The Xpdf allows to convert within R to txt as:

system(paste('"C:/Program Files/Xpdf/pdftotext.exe"', '"C:/Documents and Settings/rM/Desktop/club.pdf"'), wait=FALSE)

Is there something like pdftocsv.exe as Xpdf pdftotext.exe?

I have large amount of PDF files so I'm looking for a way to loop through this files. Single file PDF to CSV could be done even with some online services but 100s clearly not.

Maximilian
  • 4,177
  • 7
  • 46
  • 85
  • 3
    Converting PDF to **anythng** is fraught with risk. But if you're fortunate enough to have PDFs which convert the text correctly, why not just write a loop around the command you quoted? – Carl Witthoft Feb 08 '14 at 13:45
  • The problem is that the pdftotext won't preserve the layout. I have mix of text a numbers in the PDF's. I find it difficult to locate and correctly assign/separate after converting to txt.It would be easy to convert pdftotext and write.csv. – Maximilian Feb 08 '14 at 13:49
  • Yeah and few if any converter tools are guaranteed to preserve the layout. You'd be just about as well off converting the pdfs to images and doing OCR (I'm only partly kidding here) – Carl Witthoft Feb 08 '14 at 14:05
  • 1
    Did you try adding the "layout" argument to pdftotext? – A5C1D2H2I1M1N2O1R2T1 Feb 08 '14 at 14:32
  • Also, see this Q&A for ideas: http://stackoverflow.com/questions/18078303/scraping-large-pdf-tables-which-span-accross-multiple-pages – A5C1D2H2I1M1N2O1R2T1 Feb 08 '14 at 14:34
  • Thanks, but I saw that ideas on the link provided. – Maximilian Feb 08 '14 at 14:35

1 Answers1

2

There is a PDF to CSV with R package here: https://github.com/expersso/pdftables

convert_pdf('test/index.pdf', output_file = NULL, format = "xlsx-single", message = TRUE, api_key = "insert_API_key")
mphil4
  • 105
  • 9