1

I am trying to extract data from PDF files containing multiple tables. I could not find any package to do that.

I used online tools to convert PDF to html, then use package XML to convert html to data frame.

I manually convert PDF to html and want to automate the process. Is there R package to achieve this function?

PS: This is not a duplicate post. It is different from Extracting text data from PDF files.

I want to convert pdf to html, not text data.

Now I get answer that R does not have package to achieve my goal.

Community
  • 1
  • 1
Stella Hu
  • 353
  • 5
  • 9
  • I am don't remember an R package that does it, but why don't you try to paste it from clipboard. A quick google search copy data from clipboard to R would help you here. Also if you're on unix try to use `pbpaste` and paste your data into csv files. – Matt Bannert Jan 19 '14 at 15:59
  • I do not want to copy and paste manually. I want to automate all the process – Stella Hu Jan 19 '14 at 16:49
  • 1
    If this got reopened I'd vote to close right now as you are asking for us to point you to packages rather than help with specific code. I'd reformat this question starting with the title that asks for package direction. I'd include a pdf we could utilize ([you could include a short knit script to produce that](https://github.com/yihui/knitr/blob/master/inst/examples/knitr-minimal.Rnw)). Then I'd do a google search (where you'll run into the [tm package](http://cran.r-project.org/web/packages/tm/index.html). Then I'd attempt to work out something and post that code. – Tyler Rinker Jan 19 '14 at 18:45
  • It is possible to convert a PDF file to HTML with the R package RDCOMCLient. Here is an answer that I gave in another post : https://stackoverflow.com/questions/45459759/importing-data-from-a-pdf-to-html-using-r/73744624#73744624 – Emmanuel Hamel Sep 16 '22 at 20:30

0 Answers0