The regular expression (.NET/C# flavor, or PS flavor) listed below matches elements and inner text in a simple XML file line by line (multiline matches are not necessary).
^[^<]*<(?'Element'[^>\s]*)[^>]*>(?'Text'[^<]*)<\/\1>\s*$
This regex matches the following inputs correctly and quite efficiently:
See the online simulation.
<ELEMENT>inner text</ELEMENT>
<ELEMENT>inner text</ELEMENT>
<ELEMENT>inner text</ELEMENT>
<ELEMENT > inner text </ELEMENT>
<ELEMENT > inner text </ELEMENT>
<ELEMENT > inner text </ELEMENT>
<ELEMENT ATTRIB="foo"> inner text </ELEMENT>
<ELEMENT ATTRIB="foo"> inner text </ELEMENT>
However cases which are not supposed to match, perform correctly but they incur a lot of backtracking and thus are very inefficient:
See the online simulation.
ELEMENT ATTRIB="foo"> inner text </ELEMENT>
< ELEMENT ATTRIB="foo"> inner text </ELEMENT>
< ELEMENT ATTRIB="foo"> inner text </ELEMENT>
<ELEMENT>inner text</FOO>
ELEMENT ATTRIB="foo"> inner text </ELEMENT>
QUESTION: Can I use atomic groups to prevent this backtracking and speed up the mismatching performance without slowing down the matching performance ...and how?
If .Net & PS supported possessive quantifiers, I would be asking about them, too.
P.S.
This question is applicable not only to XML inputs. It is about general regex optimization with atomic groups in .NET or PS - not about processing this particular XML input.