0

I am using win32com to convert a .docx file into a .txt file. It works well until it gets unrecognized characters in Spanish.

DOC_FILEPATH = r"C:\Temp\Hugo- Ortíz -.docx"
s = find_between_r(DOC_FILEPATH, '.', '')
FILETXT = DOC_FILEPATH.strip(s)
FILETXT = FILETXT + "txt"
doc = win32com.client.GetObject(DOC_FILEPATH) 
text = doc.Range().Text 
with open(FILETXT, "wb") as f:
   f.write(text.encode("utf-8"))

When win32com.client reads the DOC_FILEPATH, I get this error

moniker, i, bindCtx = pythoncom.MkParseDisplayName(Pathname)
pywintypes.com_error: (-2147221014, 'El moniker no puede abrir un archivo', None, None)

Is there a way to read that file without changing the name?

m00am
  • 5,910
  • 11
  • 53
  • 69
AshMGM
  • 53
  • 8

1 Answers1

2

This is not how Word Automation works. Check Word Object Model ([MS.Docs]: Word) for more details.

You should create a Word.Application instance, and that will deal with the documents.

I adapted [SO]: Python - Using win32com.client to accept all changes in Word Documents and tested it on a dummy doc for you.

code.py:

#!/usr/bin/env python3

# -*- coding: cp1252 -*-

import sys
import os
import win32com.client as w32comcl


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    doc_path = r"Documento ficticío.docx"
    txt_path = os.path.splitext(doc_path)[0] + ".txt"
    word = w32comcl.Dispatch("Word.Application")
    try:
        word.Visible = False
        doc = word.Documents.Open(os.path.abspath(doc_path))
        try:
            text = doc.Range().Text
            with open(txt_path, "wb") as f:
                f.write(text.encode("utf8"))
        finally:
            doc.Close(False)
    finally:
        word.Application.Quit()

Notes:

Output:

(py35x64_test) e:\Work\Dev\StackOverflow\q049179872>dir /b
code.py
Documento ficticío.docx

(py35x64_test) e:\Work\Dev\StackOverflow\q049179872>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32


(py35x64_test) e:\Work\Dev\StackOverflow\q049179872>dir /b
code.py
Documento ficticío.docx
Documento ficticío.txt

(py35x64_test) e:\Work\Dev\StackOverflow\q049179872>type "Documento ficticío.txt"
Párrafo ficticío0: 1234567890qwertyuioopasdfghjklzxcvbnm.
CristiFati
  • 38,250
  • 9
  • 50
  • 87