11

I want to convert all the .doc files from a particular folder to .docx file.

I tried using the following code,

import subprocess
import os
for filename in os.listdir(os.getcwd()):
    if filename.endswith('.doc'):
        print filename
        subprocess.call(['soffice', '--headless', '--convert-to', 'docx', filename])

But it gives me an error: OSError: [Errno 2] No such file or directory

sunil pawar
  • 188
  • 1
  • 2
  • 8

5 Answers5

21

Here is a solution that worked for me. The other solutions proposed did not work on my Windows 10 machine using Python 3.

from glob import glob
import re
import os
import win32com.client as win32
from win32com.client import constants

# Create list of paths to .doc files
paths = glob('C:\\path\\to\\doc\\files\\**\\*.doc', recursive=True)

def save_as_docx(path):
    # Opening MS Word
    word = win32.gencache.EnsureDispatch('Word.Application')
    doc = word.Documents.Open(path)
    doc.Activate ()

    # Rename path with .docx
    new_file_abs = os.path.abspath(path)
    new_file_abs = re.sub(r'\.\w+$', '.docx', new_file_abs)

    # Save and Close
    word.ActiveDocument.SaveAs(
        new_file_abs, FileFormat=constants.wdFormatXMLDocument
    )
    doc.Close(False)

for path in paths:
    save_as_docx(path)
dshefman
  • 937
  • 9
  • 19
  • I am getting this error --> com_error: (-2147352567, 'Exception occurred.', (0, 'Microsoft Word', "Sorry, we couldn't find your file. Was it moved, renamed, or deleted?\r (C:\\//Users/shreyajain/Documents/Docum...)", 'wdmain11.chm', 24654, -2146823114), None) Any suggestion? – shreyans jain Jun 10 '20 at 11:46
  • @Shreyansjain Based on the error message, I'm guessing you typed in the file path incorrectly. Although, it's difficult to tell without seeing your code. – dshefman Jun 11 '20 at 14:34
  • 1
    1) This also allows you to convert PDF files into DOCX, allowing you to read the content of PDF documents. 2) I would suggest to add a TRY at the start of the program, to check that MS-Word is installed : MSWord_OK = True try: word = win32.gencache.EnsureDispatch('Word.Application') – Origami Dec 27 '20 at 13:27
4

I prefer to use the glob module for tasks like that. Put this in a file doc2docx.py. To make it executable, set chmod +x. And optionally put that file in your $PATH as well, to make it available "everywhere".

#!/usr/bin/env python

import glob
import subprocess

for doc in glob.iglob("*.doc"):
    subprocess.call(['soffice', '--headless', '--convert-to', 'docx', doc])

Though ideally you'd leave the expansion to the shell itself, and call doc2docx.py with the files as arguments, like doc2docx.py *.doc:

#!/usr/bin/env python

import subprocess
import sys

if len(sys.argv) < 2:
    sys.stderr.write("SYNOPSIS: %s file1 [file2] ...\n"%sys.argv[0])

for doc in sys.argv[1:]:
    subprocess.call(['soffice', '--headless', '--convert-to', 'docx', doc])

As requested by @pyd, to output to a target directory myoutputdir use:

#!/usr/bin/env python

import subprocess
import sys

if len(sys.argv) < 2:
    sys.stderr.write("SYNOPSIS: %s file1 [file2] ...\n"%sys.argv[0])

for doc in sys.argv[1:]:
    subprocess.call(['soffice', '--headless', '--convert-to', 'docx', '--outdir', 'myoutputdir', doc])
Jan Christoph Terasa
  • 5,781
  • 24
  • 34
  • From my tests this only fails when the working/target directory in question is the root of the filesystem, e.g. directly ``C:\`` or ``D:\``. Any other folder works fine. Looks like a bug in ``soffice``. You can specify the output directory by using the option `--outdir `. – Jan Christoph Terasa Jan 03 '18 at 09:24
  • do i need to pass one more argument ?? can you edit your answer – Pyd Jan 03 '18 at 09:30
3

If you don't like to rely on sub-process calls, here is the version with COM client. It is useful if you are targeting windows users without LibreOffice installed.

#!/usr/bin/env python

import glob
import win32com.client

word = win32com.client.Dispatch("Word.Application")
word.visible = 0

for i, doc in enumerate(glob.iglob("*.doc")):
    in_file = os.path.abspath(doc)
    wb = word.Documents.Open(in_file)
    out_file = os.path.abspath("out{}.docx".format(i))
    wb.SaveAs2(out_file, FileFormat=16) # file format for docx
    wb.Close()

word.Quit()
James Parker
  • 457
  • 4
  • 19
  • 2
    It is clean. However, i wonder is there any platform-independent way to convert doc into docx? – longbowking Apr 30 '19 at 07:54
  • 1
    @longbowking There is no swiss knife library to take care of this when I looked last year. One possible method is to detect OS with `sys.platform` and try Jan Christoph Terasa's approach for Linux, my approach for Windows. Not sure what works for Mac. – James Parker Apr 30 '19 at 12:16
  • Just tried unoconv with [this docker image](https://hub.docker.com/r/zrrrzzt/docker-unoconv-webservice), doc -> docx, but the resulting docx was damaged (files contained comments that I needed to preserve). – lucid_dreamer May 12 '19 at 23:16
2

based on dshefman's code,

import re
import os
import sys
import win32com.client as win32
from win32com.client import constants

# Get path from command line argument
ABS_PATH = sys.argv[1]

def save_as_docx(path):
    # Opening MS Word
    word = win32.gencache.EnsureDispatch('Word.Application')
    doc = word.Documents.Open(path)
    doc.Activate ()

    # Rename path with .docx
    new_file_abs = os.path.abspath(path)
    new_file_abs = re.sub(r'\.\w+$', '.docx', new_file_abs)

    # Save and Close
    word.ActiveDocument.SaveAs(new_file_abs, FileFormat=constants.wdFormatXMLDocument)
    doc.Close(False)

def main():
    source = ABS_PATH

    for root, dirs, filenames in os.walk(source):
        for f in filenames:
            filename, file_extension = os.path.splitext(f)

            if file_extension.lower() == ".doc":
                file_conv = os.path.join(root, f)
                save_as_docx(file_conv)
                print("%s ==> %sx" %(file_conv,f))

if __name__ == "__main__":
    main()
Anguo Zhao
  • 61
  • 2
1

Use os.path.join to specify the correct directory.

import os, subprocess

main_dir = os.path.join('/', 'Users', 'username', 'Desktop', 'foldername')

for filename in os.listdir(main_dir):
    if filename.endswith('.doc'):
        print filename
        subprocess.call(['soffice', '--headless', '--convert-to', 'docx', filename])
p-robot
  • 4,652
  • 2
  • 29
  • 38