My Python subn is too greedy.I am modifying an OFX file (XML or SGML), contained in string ofx.
I want to remove any buy or sell that contains a particular hard-coded CUSIP , and to not affect any other. A buy starts with a [BUYMF] and ends with the next [/BUYMF]. A sell starts with a [SELLMF] and ends with the next [/SELLMF]. If either contains 123456789, I want to remove each of those from the ofx string.
I added question marks after all of my wildcards in the string
(ofx,sub_count)= re.subn( \
r'<(SELLMF|BUYMF)>.*?<UNIQUEID>\s*?123456789.*?</(SELLMF|BUYMF)>' \
,'',ofx, flags=re.MULTILINE | re.DOTALL)
I expected only the buy and sell transactions would be removed, but instead a big block of transactions gets removed.
Edit after I marked my solution and then corrected it: All criticisms and comments were correct and very useful. Thanks.