1

I was looking for a server side way of converting the doc file into docx or pdf format using the python programming language without the use of win32.client, comtypes and API. Iam using it on Azure cloud services. So if there is any other way please help!

S.R Rahul
  • 23
  • 5

2 Answers2

2

There are few approaches:

  • with unoconv: unoconv -d document --format=docx test.doc
  • with lowriter: lowriter --convert-to docx test.doc
  • with soffice: soffice --headless --convert-to docx test.doc
  • with libreoffice: libreoffice --convert-to docx test.doc

You can run these command directly from your terminal but if you want you can integrated them into python as described here:

#!/usr/bin/env python

import glob
import subprocess

for doc in glob.iglob("*.doc"):
    subprocess.call(['soffice', '--headless', '--convert-to', 'docx', doc])

In the example I'm using soffice but you can now substitute unoconv, lowriter or libreoffice.

Francesco Mantovani
  • 10,216
  • 13
  • 73
  • 113
-1

Need LibreOffice

    import os
    import tempfile
    
    def doc2docx(content):
        """ Convert .doc to .docx with LibreOffice """
        with tempfile.TemporaryDirectory() as tmpdirname:
            filename = os.path.join(tmpdirname, 'filename.doc')
            with open(filename, 'wb') as doc:
                doc.write(content)
                os.system(f'soffice --headless --convert-to docx { filename } --outdir { tmpdirname }')
            filename += 'x'
            with open(filename, 'rb') as docx:
                content = docx.read()
        return content
    
    with open('test.doc', 'rb') as f:
        content = f.read()
    content = doc2docx(content)

xmduhan
  • 965
  • 12
  • 14