This was the solution I was able to come up with and hopefully this helps any one else in the future.
Here is how to do it in python
from pdf2docx import parse
# path of pdf file
pdf_file = 'tests/demo_custom.pdf'
# will create .docx in same path
docx_file = 'tests/demo_custom.docx'
# Here is where we convert pdf to docx
parse(pdf_file, docx_file, start=0, end=None)
Here is how to do it in R
Keeping that in mind, we can use reticulate
to and use py_run_string()
to just reuse our python code.
library(reticulate)
py_run_string("from pdf2docx import parse")
# path of pdf file
py_run_string("pdf_file = 'tests/demo_custom.pdf'")
# will create .docx in same path
py_run_string("docx_file = 'tests/demo_custom.docx'")
# Here is where we convert pdf to docx
py_run_string("parse(pdf_file, docx_file, start=0, end=None)")