10

I found several questions that were similar to mine, but none of the answers came close to what I need.

Specifications: I'm working with Python 3 and do not have MS Word. My programming machine is running OS X and cloud machine is linux/ubuntu too.

I'm using python-docx to extract values from a .doc file that is sent to me nightly. However, python-docx only works with .docx files, so I need to convert the file to that extension first.

So, I've got a .doc file that I need to convert to .docx. This script might have to run in the cloud so I can't install any kind of Office or Office-like software. Can this be done?

feedMe
  • 3,431
  • 2
  • 36
  • 61
zerohedge
  • 3,185
  • 4
  • 28
  • 63

5 Answers5

13

You are working with Linux/ubuntu, you can use LibreOffice’s inbuilt converter.

SYNTAX

lowriter --convert-to docx *.doc

Example

lowriter --convert-to docx testdoc.doc

This will convert all doc files to docx and save in the same folder itself.

thrinadhn
  • 1,673
  • 22
  • 32
3

You could use unoconv - Universal Office Converter. Convert between any document format supported by LibreOffice/OpenOffice.

unoconv -d document --format=docx *.doc
subprocess.call(['unoconv', '-d', 'document', '--format=docx', filename])
0

Aspose.Words Cloud SDK for Python can convert DOC to DOCX. The package can open, generate, edit, split, merge, compare and convert a Word document in Python on any platform without depending on MS Word.

It is a paid product, but the free plan provides 150 free monthly API calls.

P.S: I'm developer evangelist at Aspose.

# Import module
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile

# Get your credentials from https://dashboard.aspose.cloud (free registration is required).
words_api = asposewordscloud.WordsApi(app_sid='xxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx',app_key='xxxxxxxxxxxxxxxxxxxxxxxxx')
words_api.api_client.configuration.host = 'https://api.aspose.cloud'

filename = 'C:/Temp/02_pages.doc'
dest_name = 'C:/Temp/02_pages.docx'
#Convert RTF to text
request = asposewordscloud.models.requests.ConvertDocumentRequest(document=open(filename, 'rb'), format='docx')
result = words_api.convert_document(request)
copyfile(result, dest_name)
Tilal Ahmad
  • 940
  • 5
  • 9
0
import aspose.words as aw
path1="doc file path"
path2="path to save converted file"
file2=file.rsplit('.',1)[0]+'.docx'
filename1=os.path.join(path2,file2)
filename=os.path.join(path1,file)
doc = aw.Document(filename)
doc.save(filename1)
  • 1
    Remember that Stack Overflow isn't just intended to solve the immediate problem, but also to help future readers find solutions to similar problems, which requires understanding the underlying code. This is especially important for members of our community who are beginners, and not familiar with the syntax. Given that, **can you [edit] your answer to include an explanation of what you're doing** and why you believe it is the best approach? – Jeremy Caney Jul 07 '22 at 04:59
-3

First you will need to be using Windows. If that is an acceptable barrier then please read on....

Next you need to install the Microsoft Office Compatibility Pack.

Now download and install the Microsoft Office Migration Planning Manager.

To run the tool you need to create a .ini file that controls the program. An example .ini file and further information is available on this blog post. There is more detailed information from Microsoft here.

feedMe
  • 3,431
  • 2
  • 36
  • 61