0

This is what I made so far :

from docx import Document

document = Document('filename.docx')

dic = {
    'Stack':'Stack Overflow',
'October 18 2021' : 'Actual Date'}
for p in document.paragraphs:
    inline = p.runs
    for i in range(len(inline)):
        text = inline[i].text
        for key in dic.keys():
            if key in text:
                 text=text.replace(key,dic[key])
                 inline[i].text = text


document.save('new.docx')

But it seems that this function works fine when she need to replace one word, but whn she need to replace sentences, it doesn't work (here October 18 2021)/

Any ideas why sentences doesn't work ?

RandallCloud
  • 123
  • 9
  • The word doc is an xml file internally. The sentences usually have additional non-printing elements that mean that finding a match to replace is non-trivial. – jwal Oct 18 '21 at 16:07
  • What do you mean by non-trivial ? – RandallCloud Oct 18 '21 at 16:12
  • A simple sentence (```This is non-trivial```) where I applied and removed some formatting ```This is non-trivial``` – jwal Oct 19 '21 at 20:51

1 Answers1

2

The problem comes from the fact that part of the sentences you're reading are in fact in different runs.

As stated by scanny in this post:

So runs can effectively break up the text of a paragraph at arbitrary locations, even one run per character. In short, Word doesn't try to keep track of sentences; if you see a run that is a sentence that is pure coincidence.

One easy way to solve this problem is to do your search and replace using paragraph.text instead of inline.text

from docx import Document

document = Document('test.docx')

dic = {
    'Stack':'Stack Overflow',
    'October 18 2021' : 'Actual Date'
}
for p in document.paragraphs:
    for key in dic.keys():
        if key in p.text:
            p.text = p.text.replace(key,dic[key])

document.save('new.docx')
S. Ferard
  • 221
  • 2
  • 9