2

I am wondering if there is a way in python (tool or function etc.) to convert my pdf file to doc or docx?

I am aware of online converters but I need this in Python code.

santhosh kumar
  • 47
  • 1
  • 1
  • 9

2 Answers2

2

If you have pdf with lot of pages..below code will work:

import PyPDF2

    path="C:\\ .... "
    text=""
    pdf_file = open(path, 'rb')
    text =""
    read_pdf = PyPDF2.PdfFileReader(pdf_file)
    c = read_pdf.numPages
    for i in range(c):
         page = read_pdf.getPage(i)
         text+=(page.extractText())
Rahul Agarwal
  • 4,034
  • 7
  • 27
  • 51
  • Yes it will..I have done extensive study on this.,..there is no way u can get as it is conversion in Python. The next best solution is online tools which u are not interested in. – Rahul Agarwal Sep 14 '18 at 09:19
1

If you happen to have MS Word, there is a really simple way to do this using COM. Here is a script I wrote that can convert pdf to docx by calling the Word application.

import glob
import win32com.client
import os

word = win32com.client.Dispatch("Word.Application")
word.visible = 0

pdfs_path = "" # folder where the .pdf files are stored
for i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")):
    print(doc)
    filename = doc.split('\\')[-1]
    in_file = os.path.abspath(doc)
    print(in_file)
    wb = word.Documents.Open(in_file)
    out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i))
    print("outfile\n",out_file)
    wb.SaveAs2(out_file, FileFormat=16) # file format for docx
    print("success...")
    wb.Close()

word.Quit()
Ahsin Shabbir
  • 137
  • 1
  • 4