-1

I have a question on using python to identify texts with certain features from word document

I wish to extract texts that are bold and that have quotations around them for example:

" This is a "sentence" in word document. "

How can I identify the word "sentence" in python?

This is what I have at the moment:

from docx import Document    
document = Document(filepath)
short_list = []
for paragraph in document.paragraphs:
    for run in paragraph.runs:
       if run.bold:
          short_list.append(run.text)

Thank you all for your help!

Dazz W
  • 113
  • 8

2 Answers2

0

I would assume you cannot.

A docx file is in fact a zip file, and according to the documentation of the Python docx module, the Document object represents the document.xml part of the file. Unfortunately, footnotes are stored in a different part: footnotes.xml.

As on PyPi the modules declares its developpement status as 3-alpha, I suppose that it cannot currently process footnotes.

IMHO, you should first look if there are already open issues about the question, and if yes comment on it, or else fill a new issue on the project page.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
-3

Try using below example code:

for paragraph in document.paragraphs:
    if 'sea' in paragraph.text:
        print paragraph.text
        paragraph.text = 'new text containing ocean'

To search in Tables as well, you would need to use something like:

for table in document.tables:
    for cell in table.cells:
        for paragraph in cell.paragraphs:
            if 'sea' in paragraph.text:
               ...

See How to use python-docx to replace text in a Word document and save

tripleee
  • 175,061
  • 34
  • 275
  • 318
autom99
  • 88
  • 3
  • Why should @https://stackoverflow.com/users/874188/tripleee downvote? Is there any problem with the code? Think twice before downvoted anything.. – autom99 Jan 24 '20 at 12:39
  • I did not downvote, but you are not addressing OP's specific question. Their is about footnotes while your answer is about tables. And tables are inside the document part while footnotes are not. – Serge Ballesta Jan 24 '20 at 13:20