1

I am trying to make a function that accepts string that looks like function call in python and returns the arguments to the function Example:

"fun(1, bar(x+17, 1), arr = 's,y')"

will result:

["1", "bar(x+17, 1)", "arr = 's,y'"]

The problem of using regular expressions is that I don't know if it is possible to not split at the commas inside parenthesis or quotes. Thanks.

Edit: this Python: splitting a function and arguments doesn't answer correctly the quastions since it doesn't treat commas in parenthesis or quotes.

As @Kevin said, regular expressions cannot solve this since they can't handle nested parenthesis.

Community
  • 1
  • 1
tal
  • 860
  • 7
  • 19
  • possible duplicate of [Regex question about parsing method signature](http://stackoverflow.com/questions/4493844/regex-question-about-parsing-method-signature) – vaultah Jul 21 '15 at 18:36
  • possible duplicate of [Regular expression to return text between parenthesis](http://stackoverflow.com/questions/4894069/regular-expression-to-return-text-between-parenthesis) – letsc Jul 21 '15 at 18:36
  • You should use a regular expression to parse the stuff in between parentheses, and then split that that string to find your arguments – Jeremy Fisher Jul 21 '15 at 18:36
  • All these do not treat correctly commas in parenthesis or quotes. like in the example – tal Jul 21 '15 at 18:37
  • 1
    "vanilla" regexes can't parse nested parentheses. Maybe you can do it with more advanced features, but at some point it's going to be complex enough that you may as well just write a parser. – Kevin Jul 21 '15 at 18:43
  • possible duplicate of [Partial evaluation with pyparsing](http://stackoverflow.com/questions/1920588/partial-evaluation-with-pyparsing) – duplode Jul 22 '15 at 03:52

5 Answers5

3

you can keep track of your own state fairly simply with something like

def parse_arguments(s):
    openers = "{[\"'("
    closers = "}]\"')"
    state = []
    current = ""
    for c in s:
        if c == "," and not state:
           yield current
           current = ""
        else:
           current += c
           if c in openers:
              state.append(c)
           elif c in closers:
              assert state, "ERROR No Opener for %s"%c
              assert state[-1] == openers[closers.index(c)],"ERROR Mismatched %s %s"%(state[-1],c)
              state.pop(-1)
    assert not state, "ERROR Unexpected End, expected %s"%state[-1]
    yield current

print list(parse_arguments("1, bar(x+17, 1), arr = 's,y'"))
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • Doesn't seem to work for `parse_arguments("hello")`, I think because `"` is both closing and opening? I think it also will try to match openers/closers within strings, which it shouldn't. –  May 06 '20 at 10:20
  • I tried my best at an enhancement: https://stackoverflow.com/a/61633243/2124834 –  May 06 '20 at 10:48
2

Give a try to this complex split function.

>>> import re
>>> s = "fun(1, bar(x+17, 1), arr = 's,y')"
>>> [i.strip() for i in re.split(r'''^\w+\(|\)$|((?:\([^()]*\)|'[^']*'|"[^"]*"|[^'"(),])*)''', s) if i and i !=',']
['1', 'bar(x+17, 1)', "arr = 's,y'"]
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • good answer ... but im pretty sure for any regex there exists some counter example that will break it .. (with nested parenthesis) +1 all the same as I cant think up a good counter example offhand (and even if i could) ... but debugging it when it doesnt work might be a little painful – Joran Beasley Jul 21 '15 at 19:01
  • that said I think there is an actual regexp module in pipi `import regex` that does support stack memory(and other advanced regex features)and so you could come up with a perfect regex solution using that module – Joran Beasley Jul 21 '15 at 19:03
1

It would be nice to do the with the ast (abstract syntax tree) standard library module, although it might be overkill:

>>> import ast
>>> parsed = ast.parse("fun(1, bar(x+17, 1), arr='s, y')")
>>> ast.dump(p.body[0].value)
"Call(func=Name(id='fun', ctx=Load()), args=[Num(n=1), 
Call(func=Name(id='bar', ctx=Load()), args=[BinOp(left=Name(id='x', 
ctx=Load()), op=Add(), right=Num(n=17)), Num(n=1)], keywords=[], 
starargs=None, kwargs=None)], keywords=[keyword(arg='arr', 
value=Str(s='s, y'))], starargs=None, kwargs=None)"

Unfortunately there's no standard library way to get those back to standard strings like "1", "bar(x+17, 1)" and "arr='s, y'". But https://pypi.python.org/pypi/astor can probably do that.

RemcoGerlich
  • 30,470
  • 6
  • 61
  • 79
1
import re
x="fun(1, bar(x+17, 1), arr = 's,y')"
print re.split(r",\s*(?![^\(]*\))(?![^']*'(?:[^']*'[^']*')*[^']*$)",re.findall(r"^.*?\((.*)\)",x)[0])

You can try using re.

Output:['1', 'bar(x+17, 1)', "arr = 's,y'"]

vks
  • 67,027
  • 10
  • 91
  • 124
  • as all regex answers, they fail on nested parenthesis. try x = "fun(bar(1,foo(1)))" – tal Jul 22 '15 at 16:04
0

Based on Joran Beasley's answer with hopefully better string handling? The only change is the new if-arm, allowing any characters when we're in a string, including an escaped quote.

def parse_arguments(s):
    openers = "{[\"'("
    closers = "}]\"')"
    state = []
    current = ""
    for c in s:
        if c == "," and not state:
            yield current
            current = ""
        else:
            current += c
            if state and state[-1] in "\"'":
                if c == state[-1] and current[-1] != "\\":
                    state.pop(-1)
            else:
                if c in openers:
                    state.append(c)
                elif c in closers:
                    assert state, "ERROR No Opener for %s" % c
                    assert (
                        state[-1] == openers[closers.index(c)]
                    ), "ERROR Mismatched %s %s" % (state[-1], c)
                    state.pop(-1)
    assert not state, "ERROR Unexpected End, expected %s" % state[-1]
    yield current