I'm trying to write a regex for use in the Python re.findall method to find all comments in a Word comments.xml file that have a particular paraID.
My regex is <w:comment.*?w14:paraId="727F9BCE".*?</w:comment>
I want to match on the 2nd of the below comments, however my regex matches both.
<w:comment w:id="0" w:author="LW 2" w:date="2023-05-03T22:54:00Z" w:initials="LFW2">
<w:p w14:paraId="698DC7BC" w14:textId="22BC570B" w:rsidR="006040CE" w:rsidRDefault="006040CE">
<w:pPr>
<w:pStyle w:val="CommentText"/>
</w:pPr>
<w:r>
<w:rPr>
<w:rStyle w:val="CommentReference"/>
</w:rPr>
<w:annotationRef/>
</w:r>
<w:r>
<w:t>Open comment</w:t>
</w:r>
</w:p>
</w:comment>
<w:comment w:id="1" w:author="LW 2" w:date="2023-05-03T22:54:00Z" w:initials="LFW2">
<w:p w14:paraId="727F9BCE" w14:textId="1EDEEF44" w:rsidR="006040CE" w:rsidRDefault="006040CE">
<w:pPr>
<w:pStyle w:val="CommentText"/>
</w:pPr>
<w:r>
<w:rPr>
<w:rStyle w:val="CommentReference"/>
</w:rPr>
<w:annotationRef/>
</w:r>
<w:r>
<w:t>Done comment</w:t>
</w:r>
</w:p>
</w:comment>
I understood that modifying the .* quantifier with the ? quantifier would render the regex lazy not greedy (and so match the shortest string that matches), but this isn't the behavior of my re.findall method under Python 3.8.
I am aware I could/should use an xml parser but I've been having many issues with namespaces and thought regex would be simpler.