Python Combine For-Loops

Question

I am using python-docx to manipulate word documents. Here is what I currently have to modify text in normal paragraphs:

doc = Document('idk.docx')
for paragraph in doc.paragraphs:
    if "oldtext1" in paragraph.text:
        paragraph.replace("oldtext1","Something")
    if "oldtext2" in paragraph.text:
        paragraph.replace("oldtext2","Somethingelse")

If I want to modify the text in a table, I need to do the following

tables = doc.tables
for table in tables:
    for row in table.rows:
        for cell in row.cells:
            for paragraph in cell.paragraphs:
                if "oldtext1" in paragraph.text:
                    paragraph.replace("oldtext1","Something")
                if "oldtext2" in paragraph.text:
                    paragraph.replace("oldtext2","Somethingelse")

The code works fine and the text is replaced but the problem is that I am trying to replace ALL instances of the text in the document and I do not want to have 2 separate loops (1 for normal text in paragraphs and another for text in tables)

Is there an easy way to combine these loops so I do not have to have the same if-statements in 2 different loops?

You are looping over different things, so I don't see anything wrong with this code — OneCricketeer, Mar 15 '16 at 23:20
@cricket_007 I agree, although I would recommend putting the per-paragraph processing in a function to avoid code repetition — DaveBensonPhillips, Mar 15 '16 at 23:29
@HumphreyTriscuit - I was going to say that, but that's a personal preference and I wasn't sure both blocks would be the same — OneCricketeer, Mar 15 '16 at 23:30
@cricket_007: The code I currently have is fine but there are going to be a lot more if statements and i don't want them to exist in both sets of loops since they will be identical — Bijan, Mar 15 '16 at 23:37
@Bijan cricket is correct, though; your loops are fine. If you want fewer `if` statements then put that logic in a function and call it from each of your loops. — DaveBensonPhillips, Mar 15 '16 at 23:39

DaveBensonPhillips · Accepted Answer · 2016-03-16T02:30:45.883

3

I would just use a generator comprehension:

from itertools import chain

for paragraph in chain(doc.paragraphs, (paragraph for table in doc.tables for row in table.rows for cell in row.cells for paragraph in cell.paragraphs)):
    paragraph.replace("oldtext1","Something")
    paragraph.replace("oldtext2","Somethingelse")

Taking note that you don't need to do the lookahead check for paragraph.replace()

edited Mar 16 '16 at 02:30

answered Mar 15 '16 at 23:18

DaveBensonPhillips

3,134
1
20
32

So does this first go through all the paragraphs and then go through all the text in cells? – Bijan Mar 15 '16 at 23:41
1

@Bijan Yes, but if you switch the order of addition it will do the opposite – DaveBensonPhillips Mar 15 '16 at 23:43
Note: Using [`itertools.chain`](https://docs.python.org/3/library/itertools.html#itertools.chain) and changing the listcomp to a genexpr would allow you to iterate lazily, instead of creating 2 potentially huge temporary `list`s up front. Also need to reverse order of loops. `from itertools import chain`, `for paragraph in chain(doc.paragraphs, (para for table in doc.tables for row in table.rows for cell in row.cells for para in cell.paragraphs)):` will get the same results (unless early loop mutates one of the values in `doc.tables`), but produce initial results faster with lower peak memory. – ShadowRanger Mar 16 '16 at 02:10
@HumphreyTriscuit: I kinda glossed over it in my last comment, but you have the order of the loops in your listcomp backwards. The leftmost, not rightmost, `for` should be over `docs.tables`, then `table.rows`, etc. – ShadowRanger Mar 16 '16 at 02:15
1

@ShadowRanger You're right, thanks! Fixed it up. Also completely agree on using `itertools.chain()`, I'll add that to my answer. Thanks again – DaveBensonPhillips Mar 16 '16 at 02:31
@JaredGoguen OK, and there are lots of things I would do, too. This is just a proof of concept that helps to answer his question – DaveBensonPhillips Mar 16 '16 at 14:39

Jared Goguen · Answer 2 · 2016-03-16T19:17:02.387

While a generator comprehension works fine, it might be cleaner to delegate this task to it's own function. It's a good amount more readable.

# Python 2.X
def get_all_paragraphs(document):
    for paragraph in document.paragraphs:
        yield paragraph

    for table in document.tables:
        for row in table.rows:
            for cell in row.cells:
                for paragraph in cell.paragraphs:
                    yield paragraph

This can be cleaned up some in Python 3.X by using the yield from construct.

# Python 3.X
def get_all_paragraphs(document):
    yield from document.paragraphs

    for table in document.tables:
        for row in table.rows:
            for cell in row.cells:
                yield from cell.paragraphs

I can't think of a way to get around the "for row in rows... for cell in row..." pattern however.

The usage is:

for paragraph in get_all_paragraphs(doc):
    paragraph.replace("oldtext1","Something")
    paragraph.replace("oldtext2","Somethingelse")

Python Combine For-Loops

2 Answers2