0

I am making a parser to allow calling functions in a string. I've already coded the parser and it uses regex to find matches of the syntax {function_name(whatever input)}. The parser works on nested levels and it goes down parsing until the "input" no longer contains another nested function. The "input" is just the parameters in between the brackets.

Eg in {function(arg, arg2, arg3} I've already managed to extract arg, arg2, arg3 as the input. Then I do a simple .split(',') to separate the arguments. This still works fine when there are nested functions (eg arg, nested(arg2), arg3).

However the problem occurs when the nested function call has its own parameters separated by a comma (eg arg, nested(arg2, another), arg3). I can't do a simple split anymore. I've tried regex to determine which commas are in quotes but if its not greedy it doesn't catch everything and if it is greedy it interferes with other nested functions.

arg, nested(arg2, another), arg3, call(this)
           ^->                           <-^

Various regexes I see when I went through Stack Overflow either don't work in python and/or use regex features unavailable* in the built-in library (re). * For the feature mentioned below (recursion), a more advanced regex library had it. I don't mind if answers use this library rather than python's own regex module.

This has been an XY problem where I tried looking for other things to do like replace commas in brackets with a placeholder then split but that also runs into problems with nested brackets at unknown nest level. Also looked at removing the whole nested function call using \(([^()]|(?R))*\) from here. But that seems to be giving literary random results, don't know if thats because the engine isn't working correctly, on regex101 it does capture the brackets even with nested ones in them.

The original thing I need to do is split on a comma unless its in brackets and there can defiantly be nested ones. If this problem can be solved using non-regex its still acceptable.

Example of expected output:

"arg, nested(arg2, another), arg3, call(this)"
['arg', 'nested(arg2, another)', 'arg3', 'call(this)']

If it matters I'm using Python 3.9.1 on Windows to code/test the parser, solution should work on a Ubuntu server too thought.

Ali
  • 108
  • 9
  • If you are writing a parser, I would then suggest that it be one that can handle context-free languages (for example any of the following: recursive descent parser, LL(1) parser or LALR(1) parser. Then, your lexical analyzer only has to recognize tokens such as , , , etc. for which you can use the `re` module but your parser handles parsing function calls where its optional arguments may themselves be function calls. In short, your language is not *regular* and therefore using a regular expression engine is not the correct tool for writing its parser. – Booboo Jul 17 '22 at 22:22

1 Answers1

0

This should work for your example: (i added 'arg4')

import re
s1='arg, nested(arg2, another), arg3, call(this), arg4'
print(re.findall('[\w]+\([^(]*?\)|[\w]+',s1))

Output:

['arg', 'nested(arg2, another)', 'arg3', 'call(this)', 'arg4']

However, if a nested function has another nested function as its parameters, the code will not work properly:

s2='arg, nested(arg2, func(arg5)), arg3, call(this), arg4'
print(re.findall('[\w]+\([^(]*?\)|[\w]+',s2))

Output:

['arg', 'nested', 'arg2', 'func(arg5)', 'arg3', 'call(this)', 'arg4']

For that reason the following function will work better: (But i think there may be a solution, that also uses re, though)

def f(x):
    res=[]
    line=''
    c=0
    for i in x:
        if i=='(':
            c+=1
        if i==')':
            c-=1
        if c==0:
            if i in (' ',','):
                if i==',':
                    res.append(line)
                    line=''
                continue
        line+=i
    res.append(line)
    return res

s2='arg, nested(arg2, func(arg5)), arg3, call(this), arg4'
print(f(s2))

Output:

['arg', 'nested(arg2, func(arg5))', 'arg3', 'call(this)', 'arg4']

P.S. The presented code works on examples, but I am afraid that there may be details left out. Please be vigilant if you use it and let us know if you find any flaws.