I was looking for a server side way of converting the doc file into docx or pdf format using the python programming language without the use of win32.client
, comtypes
and API. Iam using it on Azure cloud services. So if there is any other way please help!
Asked
Active
Viewed 1,372 times
1

S.R Rahul
- 23
- 5
-
What do you mean for "Azure cloud services" do you have SSH access to it? Or is a Cloud based app like Google Docs? Or is that a VM running Linux/Windows? Because there are many approach to it – Francesco Mantovani Mar 11 '20 at 10:24
-
Thanks for reply !! It is VM running on Linux – S.R Rahul Mar 12 '20 at 18:26
-
Last question: is that VM running Windows 10 or Linux? – Francesco Mantovani Mar 12 '20 at 21:14
-
VM running Linux. – S.R Rahul Mar 16 '20 at 07:28
-
If you tell me why you cannot install 'win32com.client' I can give you more more advices – Francesco Mantovani Mar 17 '20 at 08:35
2 Answers
2
There are few approaches:
- with unoconv:
unoconv -d document --format=docx test.doc
- with lowriter:
lowriter --convert-to docx test.doc
- with soffice:
soffice --headless --convert-to docx test.doc
- with libreoffice:
libreoffice --convert-to docx test.doc
You can run these command directly from your terminal but if you want you can integrated them into python as described here:
#!/usr/bin/env python
import glob
import subprocess
for doc in glob.iglob("*.doc"):
subprocess.call(['soffice', '--headless', '--convert-to', 'docx', doc])
In the example I'm using soffice
but you can now substitute unoconv
, lowriter
or libreoffice
.

Francesco Mantovani
- 10,216
- 13
- 73
- 113
-1
Need LibreOffice
import os
import tempfile
def doc2docx(content):
""" Convert .doc to .docx with LibreOffice """
with tempfile.TemporaryDirectory() as tmpdirname:
filename = os.path.join(tmpdirname, 'filename.doc')
with open(filename, 'wb') as doc:
doc.write(content)
os.system(f'soffice --headless --convert-to docx { filename } --outdir { tmpdirname }')
filename += 'x'
with open(filename, 'rb') as docx:
content = docx.read()
return content
with open('test.doc', 'rb') as f:
content = f.read()
content = doc2docx(content)

xmduhan
- 965
- 12
- 14