4

I am using python-docx to manipulate word documents. Here is what I currently have to modify text in normal paragraphs:

doc = Document('idk.docx')
for paragraph in doc.paragraphs:
    if "oldtext1" in paragraph.text:
        paragraph.replace("oldtext1","Something")
    if "oldtext2" in paragraph.text:
        paragraph.replace("oldtext2","Somethingelse")

If I want to modify the text in a table, I need to do the following

tables = doc.tables
for table in tables:
    for row in table.rows:
        for cell in row.cells:
            for paragraph in cell.paragraphs:
                if "oldtext1" in paragraph.text:
                    paragraph.replace("oldtext1","Something")
                if "oldtext2" in paragraph.text:
                    paragraph.replace("oldtext2","Somethingelse")

The code works fine and the text is replaced but the problem is that I am trying to replace ALL instances of the text in the document and I do not want to have 2 separate loops (1 for normal text in paragraphs and another for text in tables)

Is there an easy way to combine these loops so I do not have to have the same if-statements in 2 different loops?

Bijan
  • 7,737
  • 18
  • 89
  • 149
  • You are looping over different things, so I don't see anything wrong with this code – OneCricketeer Mar 15 '16 at 23:20
  • 1
    @cricket_007 I agree, although I would recommend putting the per-paragraph processing in a function to avoid code repetition – DaveBensonPhillips Mar 15 '16 at 23:29
  • @HumphreyTriscuit - I was going to say that, but that's a personal preference and I wasn't sure both blocks would be the same – OneCricketeer Mar 15 '16 at 23:30
  • @cricket_007: The code I currently have is fine but there are going to be a lot more if statements and i don't want them to exist in both sets of loops since they will be identical – Bijan Mar 15 '16 at 23:37
  • @Bijan cricket is correct, though; your loops are fine. If you want fewer `if` statements then put that logic in a function and call it from each of your loops. – DaveBensonPhillips Mar 15 '16 at 23:39

2 Answers2

3

I would just use a generator comprehension:

from itertools import chain

for paragraph in chain(doc.paragraphs, (paragraph for table in doc.tables for row in table.rows for cell in row.cells for paragraph in cell.paragraphs)):
    paragraph.replace("oldtext1","Something")
    paragraph.replace("oldtext2","Somethingelse")

Taking note that you don't need to do the lookahead check for paragraph.replace()

DaveBensonPhillips
  • 3,134
  • 1
  • 20
  • 32
  • So does this first go through all the paragraphs and then go through all the text in cells? – Bijan Mar 15 '16 at 23:41
  • 1
    @Bijan Yes, but if you switch the order of addition it will do the opposite – DaveBensonPhillips Mar 15 '16 at 23:43
  • Note: Using [`itertools.chain`](https://docs.python.org/3/library/itertools.html#itertools.chain) and changing the listcomp to a genexpr would allow you to iterate lazily, instead of creating 2 potentially huge temporary `list`s up front. Also need to reverse order of loops. `from itertools import chain`, `for paragraph in chain(doc.paragraphs, (para for table in doc.tables for row in table.rows for cell in row.cells for para in cell.paragraphs)):` will get the same results (unless early loop mutates one of the values in `doc.tables`), but produce initial results faster with lower peak memory. – ShadowRanger Mar 16 '16 at 02:10
  • @HumphreyTriscuit: I kinda glossed over it in my last comment, but you have the order of the loops in your listcomp backwards. The leftmost, not rightmost, `for` should be over `docs.tables`, then `table.rows`, etc. – ShadowRanger Mar 16 '16 at 02:15
  • 1
    @ShadowRanger You're right, thanks! Fixed it up. Also completely agree on using `itertools.chain()`, I'll add that to my answer. Thanks again – DaveBensonPhillips Mar 16 '16 at 02:31
  • @JaredGoguen OK, and there are lots of things I would do, too. This is just a proof of concept that helps to answer his question – DaveBensonPhillips Mar 16 '16 at 14:39
1

While a generator comprehension works fine, it might be cleaner to delegate this task to it's own function. It's a good amount more readable.

# Python 2.X
def get_all_paragraphs(document):
    for paragraph in document.paragraphs:
        yield paragraph

    for table in document.tables:
        for row in table.rows:
            for cell in row.cells:
                for paragraph in cell.paragraphs:
                    yield paragraph

This can be cleaned up some in Python 3.X by using the yield from construct.

# Python 3.X
def get_all_paragraphs(document):
    yield from document.paragraphs

    for table in document.tables:
        for row in table.rows:
            for cell in row.cells:
                yield from cell.paragraphs

I can't think of a way to get around the "for row in rows... for cell in row..." pattern however.

The usage is:

for paragraph in get_all_paragraphs(doc):
    paragraph.replace("oldtext1","Something")
    paragraph.replace("oldtext2","Somethingelse")
Jared Goguen
  • 8,772
  • 2
  • 18
  • 36