0

I want to convert multiple txt files to docx. I use this code:

from docx import Document
import re
import os

path = 'd://2022_12_02'
direct = os.listdir(path)

for i in direct:
    document = Document()
    document.add_heading(i, 0)
    myfile = open('d://2022_12_02'+i).read()
    myfile = re.sub(r'[^\x00-\x7F]+|\x0c',' ', myfile) # remove all non-XML-compatible characters
    p = document.add_paragraph(myfile)
    document.save('d://2022_12_02'+i+'.docx')

After RUN I get this error:

Traceback (most recent call last):
  File "D:\convert txt to docs.py", line 4, in <module>
    from docx import Document
  File "C:\Users\Castel\AppData\Roaming\Python\Python310\site-packages\docx.py", line 30, in <module>
    from exceptions import PendingDeprecationWarning
ModuleNotFoundError: No module named 'exceptions'
>>> 

ALSO, in docx module, I see this line underlined with red colour:

from exceptions import PendingDeprecationWarning

Toto
  • 89,455
  • 62
  • 89
  • 125
Just Me
  • 864
  • 2
  • 18
  • 28
  • 1
    Try pip installing [python-docx](https://stackoverflow.com/a/44233838/17200348) – B Remmelzwaal Jan 28 '23 at 16:36
  • yes, seems to be ok if I use `pip install python-docx` But, now, I believe the path is not correct, even if it is correct. I believe Python wants to write different way the path. `OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'd:\\022_12_02'` – Just Me Jan 28 '23 at 16:40
  • Have you tried writing the path using a raw string literal like `r'd:\2022_12_02'`? Maybe that is causing an issue. – B Remmelzwaal Jan 28 '23 at 16:42
  • see this, https://snipboard.io/H7gs6k.jpg – Just Me Jan 28 '23 at 16:45
  • And you're not using the path variable there again because? – B Remmelzwaal Jan 28 '23 at 16:47
  • I try all combinations of the same path, maybe is something else? Maybe I must ignore UTF-8 when opening, somethling like: `f.write(text.encode('utf8', 'ignore'))` But, this also is not working.. – Just Me Jan 28 '23 at 16:53
  • Have you tried `myfile = open(path).read()` at all? – B Remmelzwaal Jan 28 '23 at 16:56
  • seems that Python see the file, but cannot open it: `FileNotFoundError: [Errno 2] No such file or directory: 'd:\x822_12_025g.txt'` – Just Me Jan 28 '23 at 16:57
  • I try `myfile = open(path).read()` and I get this error: `PermissionError: [Errno 13] Permission denied: 'd:\\2022_12_02'` – Just Me Jan 28 '23 at 16:58
  • And `path` is still the raw, unescaped string? Seems to be some weird artifacts in the string. – B Remmelzwaal Jan 28 '23 at 17:00
  • can you test the python code? See yourself. I don't know where is the problem.. – Just Me Jan 28 '23 at 17:02
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/251458/discussion-between-just-me-and-b-remmelzwaal). – Just Me Jan 28 '23 at 17:08

1 Answers1

0

I find B Remmelzwaal solution very good. Try this. You also must install this library:

pip install python-docx

ALso, I modify the code, with a better version. If someone want to convert text .txt files into .docx word file.

# pip install docx
# pip install document
# pip install python-docx
# pip install pathlib
 
import re
import os
from pathlib import Path
import sys
from docx import Document
 
# Locatia unde se afla fisierele
input_path = r'c:\Folder7\input'
# Locatia unde vom scrie fisierele docx
output_path = r'c:\Folder7\output'
# Creeaza structura de foldere daca nu exista
os.makedirs(output_path, exist_ok=True)
 
# Verifica existenta folder-ului
directory_path = Path(input_path)
if directory_path.exists() and directory_path.is_dir():
    print(directory_path, "exists")
else:
    print(directory_path, "is invalid")
    sys.exit(1)
 
for file_path in directory_path.glob("*"):
    # file_path is a Path object
 
    print("Procesez fisierul:", file_path)
    document = Document()
    # file_path.name is the name of the file as str without the Path
    document.add_heading(file_path.name, 0)
 
    file_content = file_path.read_text(encoding='UTF-8')
    document.add_paragraph(file_content)
 
    # build the new path where we store the files
    output_file_path = os.path.join(output_path, file_path.name + ".docx")
 
    document.save(output_file_path)
    print("Am convertit urmatorul fisier:", file_path, "in: ", output_file_path)

SOURCE

Just Me
  • 864
  • 2
  • 18
  • 28