-1

My Python subn is too greedy.I am modifying an OFX file (XML or SGML), contained in string ofx.

I want to remove any buy or sell that contains a particular hard-coded CUSIP , and to not affect any other. A buy starts with a [BUYMF] and ends with the next [/BUYMF]. A sell starts with a [SELLMF] and ends with the next [/SELLMF]. If either contains 123456789, I want to remove each of those from the ofx string.

I added question marks after all of my wildcards in the string

(ofx,sub_count)= re.subn( \
   r'<(SELLMF|BUYMF)>.*?<UNIQUEID>\s*?123456789.*?</(SELLMF|BUYMF)>' \
   ,'',ofx,  flags=re.MULTILINE | re.DOTALL)

I expected only the buy and sell transactions would be removed, but instead a big block of transactions gets removed.

Edit after I marked my solution and then corrected it: All criticisms and comments were correct and very useful. Thanks.

CL1
  • 1
  • 3
  • 2
    Along with the description it could help if you added a few examples including input and expected output. – Kostas Mouratidis Jun 11 '19 at 16:53
  • The first `.*?` matches _everything_ until "an 123456789" is encountered, _including_ any sells and buys in between, that's _why_ it is too greedy. – Jos Jun 11 '19 at 17:29
  • In prep for my adding the example input, I realized my flaw. There was no buy or sell of 123456789 but there was a 123456789 for a dividend. I see how the Python is indeed selecting the smallest match, because per my intention, there should have been no match. I don't have the answer yet to just remove the buys and sells of 123456789, but I see why my code failed. Thanks. folks. I am not sure of the solution yet, but I now can more effectively come up with a solution. – CL1 Jun 11 '19 at 17:37
  • No need to add "status" messages in the title. If you find a solution, feel free to post it as an answer. – glibdud Jun 11 '19 at 17:56
  • Possible solution: Every transaction will have a `
    .*?\s*?123456789((?!
    ' \ ,'',ofx, flags=re.MULTILINE | re.DOTALL) ``` I will be able to test now for lack of greed, and I will wait for the case to occur where I want to actually remove the transaction.
    – CL1 Jun 11 '19 at 18:16

1 Answers1

0
(ofx,sub_count)= re.subn( \
 r'<(SELLMF|BUYMF)>(?:(?!MF>).)*<UNIQUEID>\s*?123456789(?:(?!MF>).)*</(SELLMF|BUYMF)>' \
   ,'',ofx,  flags=re.MULTILINE | re.DOTALL)

Works as modified above. I found my first "solution" was flawed. There may be better solutions. Thanks to all who commented.

CL1
  • 1
  • 3