0

I have a docx file, I opened it in PyCharm using textract. The docx contains a text with multiple paragraphs. What I want to do is detect every paragraph break and put every paragraph in a separate variables or as a list as string to use for later?

How can I do that in Python 3?

Please help!

I haven't anything on the same.

1 Answers1

1

You can achieve that by using Document from docx

from docx import Document
document = Document('path/to/your/file.docx')
paragraphs = [para.text for para in document.paragraphs]
David Meu
  • 1,527
  • 9
  • 14
  • Not working. Getting this error. Traceback (most recent call last): File "C:\Users\Admin\PycharmProjects\pythonProject\test 1.py", line 1, in from docx import Document File "C:\Users\Admin\PycharmProjects\pythonProject\venv\Lib\site-packages\docx.py", line 30, in from exceptions import PendingDeprecationWarning ModuleNotFoundError: No module named 'exceptions' – Swapnil MIB Jan 15 '23 at 07:10
  • Try this: https://stackoverflow.com/questions/22765313/when-import-docx-in-python3-3-i-have-error-importerror-no-module-named-excepti – David Meu Jan 15 '23 at 07:12