Removing/replacing multi-line code sections with python

Question

I'm trying to remove multiple lines containing an obsoleted code fragment from various file with the help of python. I looked for some examples but could not really find what I was looking for. What I basically need is something that does in principle the following (contains non-python syntax):

def cleanCode(filepath):
"""Clean out the obsolete or superflous 'bar()' code."""
with open(filepath, 'r') as foo_file:
    string = foo_file[index_of("bar("):]
    depth = 0
    for char in string:
        if char == "(": depth += 1
        if char == ")": depth -= 1
        if depth == 0: last_index = current_char_position
with open(filepath,'w') as foo_file:
    mo_file.write(string)

The thing is that the construct I'm parsing for and want to replace could contain other nested statements that also need to be removed as part of the bar(...) removal.

Here is what a sample, to be cleaned, code snippet would look like:

annotation (
  foo1(k=3),
  bar(
    x=0.29,
    y=0,
    bar1(
    x=3, y=4),
    width=0.71,
    height=0.85),
  foo2(System(...))

I would think that someone might have solved something similar before :)

Unless you have thousands complicated expressions like that, it's probably simpler to do by hand, maybe aided by `sed`. And if the problem *is* massive enough to justify a proper solution, the *real* proper solution is constructing an AST, modifying it, and converting it back to code. Fool-proof, easy, mostly already done in the stdlib, but requires understanding and may require extra work when you need an exotic coding style for the output. — , Apr 25 '12 at 15:09
Is the code you are working on also Python code? If no, what is it? — Sven Marnach, Apr 25 '12 at 15:12
I'd suggest you to describe what you want to do instead of just jumping on how are you trying to do it. This seems like sometime you would do with a regex replace in your editor/IDE or using a refactor tool. — KurzedMetal, Apr 25 '12 at 15:14
Your code fragment reads the contents of a file into a string, runs a loop that does not modify the string, and then writes the unmodified string back to the same file. Am I missing something? — Sven Marnach, Apr 25 '12 at 15:14
@delnan: How would you find a string enclosed by balanced parentheses with sed? — Sven Marnach, Apr 25 '12 at 15:16
related for sed or any other regex-based solution: http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns — mensi, Apr 25 '12 at 15:18
@KurzedMetal Yes you are right I should not have provided my half-baked attempt which currently does nothing like SvenMarnach correctly noted. And yes I need to do this on LOADS of files. And will need to do so later on again (basically for cleaning up tool generated gibberish from time to time). So I would have done it with sed if not the multiple line nature of the problem normally means it does not work well with sed. — Dietmar Winkler, Apr 25 '12 at 15:45

score 2 · Answer 1 · answered Apr 25 '12 at 16:02

try this :

clo=0
def remov(bar):
   global clo
   open_tag=strs.find('(',bar) # search for a '(' open tag
   close_tag=strs.find(')',bar)# search for a ')' close tag
   if open_tag > close_tag:
      clo=strs.find(')',close_tag+1)
   elif open_tag < close_tag and open_tag!=-1:
      remov(close_tag)



f=open('small.in')
strs="".join(f.readlines())
bar=strs.find('bar(')
remov(bar+4)
new_strs=strs[0:bar]+strs[clo+2:]
print(new_strs)
f.close()

output:

annotation (
  foo1(k=3),
  foo2(System(...))

Thanks Aswini, this would have done the job if there would not been pyparsing which is even more comfortable :) — Dietmar Winkler, Apr 26 '12 at 06:13

PaulMcG · Accepted Answer · 2012-04-30T12:57:21.637

2

Pyparsing has some built-ins for matching nested parenthetical text - in your case, you aren't really trying to extract the content of the parens, you just want the text between the outermost '(' and ')'.

from pyparsing import White, Keyword, nestedExpr, lineEnd, Suppress

insource = """
annotation (
  foo1(k=3),
  bar(
    x=0.29,
    y=0,
    bar1(
    x=3, y=4),
    width=0.71,
    height=0.85),
  foo2(System(...))
"""

barRef = White(' \t') + Keyword('bar') + nestedExpr() + ',' + lineEnd

out = Suppress(barRef).transformString(insource)
print out

Prints

annotation (
  foo1(k=3),
  foo2(System(...))

EDIT: parse action to not strip bar() calls ending with '85':

barRef = White(' \t') + Keyword('bar') + nestedExpr()('barargs') + ','
def skipEndingIn85(tokens):
    if tokens.barargs[0][-1].endswith('85'):
        raise ParseException('ends with 85, skipping...')
barRef.setParseAction(skipEndingIn85)

edited Apr 30 '12 at 12:57

answered Apr 25 '12 at 20:11

PaulMcG

62,419
16
94
130

Thanks Paul, this is exactly what I was looking for. I was wondering if one could also provide a tuple of Keywords. So for example if I wanted to match 'bar()' and/or 'foo2()'. – Dietmar Winkler Apr 26 '12 at 06:41
Paul one more question to your nice library, I tried to extend the `nestedExpr` to mach some special cases but for some reason `nestedExpr('(','85)')` will not match the 'bar' example above. Here the whole barRef I used: `barRef = ZeroOrMore(White(' \t')) + Keyword('bar') + nestedExpr('(','85)') + ',' + ZeroOrMore(White(' \t') + lineEnd )` – Dietmar Winkler Apr 26 '12 at 07:14
THe closing string in nestedExpr is supposed to be the closing string of *every* nested parenthesis, not just the last one, and your embedded parens ends in '4)', not '85)'. As for matching various keywords, just change `Keyword('bar')` to `(Keyword('bar')|Keyword('baz')|Keyword('foo2'))`. – PaulMcG Apr 26 '12 at 07:29
Ah I see. Well how would one then restrict the removal to only those 'bar' constructs that end with '`85)'`? – Dietmar Winkler Apr 26 '12 at 08:19
Add a parse action to the nestedExpr that looks at the last outermost token (probably tokens[0][-1]) to see if it 85, and if it isn't raise a ParseException. `if tokens[0][-1] != '85': raise ParseException('must end in 85')`. Look at the docs to see how to add a parse action. – PaulMcG Apr 26 '12 at 12:50
OK thanks, I think I'm gonna need some more time to digest the parse actions and how to set them up properly. – Dietmar Winkler Apr 26 '12 at 16:09
Hi Paul, OK I tried what you said. Problem is that when I raise a `ParseException` on the embedded parens (since it does not end on 85) the script will stop looking for further nested Expressions and therefor will not do any replacing. I guess what I would need is something that is "weaker" than `ParseException` wthat will allow the `setParseAction` to continue UNTIL it finds the nested expression ending on the correct token. I don't know if pyparsing has such a mechanism. I have not been able to find it in the docs though. – Dietmar Winkler Apr 30 '12 at 10:44
Dietmar, this is not consistent with my experience, I think we'll need to dig a little deeper into what your parse action is doing. – PaulMcG Apr 30 '12 at 12:56
Ok, I added the parse action to the overall bar expression instead of nestedExpr. That avoids too many '85' tests on all nestings. – PaulMcG Apr 30 '12 at 12:58
OK that was helpful, I mistakenly defined the parse action like this: `barRef = White(' \t') + Keyword('bar') + nestedExpr().setParseAction(foo) + ',' + lineEnd` which seems to make a difference. Also there was a slight misunderstanding due to me not expressing myself properly. I wanted to remove all nested expressions ending with `85` even if they contained more nested expressions but not the ones in which they were nested them self. Bu a simple `if not tokens.bar....` did the trick. – Dietmar Winkler Apr 30 '12 at 13:52

Removing/replacing multi-line code sections with python

2 Answers2

Linked