1

I know this is possible with python but I wanted to see if it was possible with R.

I have been using the following site https://www.viewpdf.com/pdf-to-word.html to convert some pdf outputs into word. It seems to work well and keep all the formatting. But I was wondering if there was a way to do this with R?

I have looked at various packages such as pdftools and couldn't find much info.

Thank you.

RL_Pug
  • 697
  • 7
  • 30
  • Here is the reverse, but it could be helpful: https://stackoverflow.com/questions/49113503/how-to-convert-docx-to-pdf – dcsuka Aug 09 '22 at 19:49

2 Answers2

1

This was the solution I was able to come up with and hopefully this helps any one else in the future.

Here is how to do it in python

from pdf2docx import parse

# path of pdf file
pdf_file = 'tests/demo_custom.pdf'

# will create .docx in same path
docx_file = 'tests/demo_custom.docx'

# Here is where we convert pdf to docx
parse(pdf_file, docx_file, start=0, end=None)
    

Here is how to do it in R

Keeping that in mind, we can use reticulate to and use py_run_string() to just reuse our python code.

library(reticulate)

py_run_string("from pdf2docx import parse")

# path of pdf file
py_run_string("pdf_file = 'tests/demo_custom.pdf'")

# will create .docx in same path
py_run_string("docx_file = 'tests/demo_custom.docx'")

# Here is where we convert pdf to docx
py_run_string("parse(pdf_file, docx_file, start=0, end=None)")
RL_Pug
  • 697
  • 7
  • 30
  • I was able to do it with the R package RDCOMClient. See my answer here : https://stackoverflow.com/questions/32846741/convert-pdf-file-to-docx/73720411#73720411 – Emmanuel Hamel Sep 14 '22 at 16:41
0

Here is an approach that can be considered to convert a PDF file as DOCX with R :

library(RDCOMClient)

wordApp <- COMCreate("Word.Application")
wordApp[["Visible"]] <- TRUE
wordApp[["DisplayAlerts"]] <- FALSE
path_To_PDF_File <- "xxx.pdf"
path_To_Word_File <- "xxx.docx"

doc <- wordApp[["Documents"]]$Open(normalizePath(path_To_PDF_File),
                                   ConfirmConversions = FALSE)
doc$SaveAs2(path_To_Word_File)
Emmanuel Hamel
  • 1,769
  • 7
  • 19