3

I'd like to create a regular expression in Python that will match against a line in Python source code and return a list of function calls.

The typical line would look like this:

something = a.b.method(time.time(), var=1) + q.y(x.m())

and the result should be:

["a.b.method()", "time.time()", "q.y()", "x.m()"]

I have two problems here:

  1. creating the correct pattern
  2. the catch groups are overlapping

thank you for help

xralf
  • 3,312
  • 45
  • 129
  • 200

5 Answers5

13

I don't think regular expressions is the best approach here. Consider the ast module instead, for example:

class ParseCall(ast.NodeVisitor):
    def __init__(self):
        self.ls = []
    def visit_Attribute(self, node):
        ast.NodeVisitor.generic_visit(self, node)
        self.ls.append(node.attr)
    def visit_Name(self, node):
        self.ls.append(node.id)


class FindFuncs(ast.NodeVisitor):
    def visit_Call(self, node):
        p = ParseCall()
        p.visit(node.func)
        print ".".join(p.ls)
        ast.NodeVisitor.generic_visit(self, node)


code = 'something = a.b.method(foo() + xtime.time(), var=1) + q.y(x.m())'
tree = ast.parse(code)
FindFuncs().visit(tree)

result

a.b.method
foo
xtime.time
q.y
x.m
georg
  • 211,518
  • 52
  • 313
  • 390
  • 1
    +1 nice tutorial on the `ast` module! Nice to know that it provides something a bit more useful than just `literal_eval` :) – Karl Knechtel Dec 28 '11 at 19:41
  • In fact, unless I'm mistaken a regex based approach is doomed to fail. The Python language is based upon a context-free grammar, and (again unless I'm mistaken) a CFG is more expressive than a Regular Expression (thank you [Chomsky Hierarchy](http://en.wikipedia.org/wiki/Chomsky_hierarchy) – Adam Parkin Sep 10 '12 at 17:22
  • 2
    @AdamParkin: some of the answers to [this question](http://stackoverflow.com/questions/11306641/what-kind-of-formal-languages-can-modern-regex-engines-parse) might be interesting for you. – georg Sep 10 '12 at 18:00
4
$ python3
>>> import re
>>> from itertools import chain
>>> def fun(s, r):
...     t = re.sub(r'\([^()]+\)', '()', s)
...     m = re.findall(r'[\w.]+\(\)', t)
...     t = re.sub(r'[\w.]+\(\)', '', t)
...     if m==r:
...         return
...     for i in chain(m, fun(t, m)):
...         yield i
...
>>> list(fun('something = a.b.method(time.time(), var=1) + q.y(x.m())', []))
['time.time()', 'x.m()', 'a.b.method()', 'q.y()']
kev
  • 155,172
  • 47
  • 273
  • 272
2
/([.a-zA-Z]+)\(/g

should match the method names; you'd have to add the parens after since you have some nested.

Evan Davis
  • 35,493
  • 6
  • 50
  • 57
  • `foo("bar(a,b)")` would return `bar` incorrectly for that regex. – Douglas Leeder Dec 28 '11 at 16:42
  • @DouglasLeeder It looks good but [this](http://pastebin.com/7dKpRh5B) Python code doesn't print what is expected. – xralf Dec 28 '11 at 17:35
  • @xralf looks like python doesn't use the bounding slashes, and also uses different functions for global search: http://pastebin.com/QbD2awfJ should do what you want. – Evan Davis Dec 28 '11 at 17:49
  • @DouglasLeeder Thank you. This works good now, but the thg435's solution seems to cover more special cases. – xralf Dec 28 '11 at 18:01
1

I don't really know Python, but I can imagine that making this work properly involves some complications, eg:

  • strings
  • comments
  • expressions that return an object

But for your example, an expression like this works:

(?:\w+\.)+\w+\(
Qtax
  • 33,241
  • 9
  • 83
  • 121
0


I have an example for you proving this is doable in Python3

    import re


    def parse_func_with_params(inp):
        func_params_limiter = ","
        func_current_param = func_params_adder = "\s*([a-z-A-Z]+)\s*"

        try:
            func_name = "([a-z-A-Z]+)\s*"
            p = re.compile(func_name + "\(" + func_current_param + "\)")
            print(p.match(inp).groups())
        except:
            while 1:
                func_current_param += func_params_limiter + func_params_adder
                try:
                    func_name = "([a-z-A-Z]+)\s*"
                    p = re.compile(func_name + "\(" + func_current_param + "\)")
                    print(p.match(inp).groups())
                    break
                except:
                    pass

Command line Input: animalFunc(lion, tiger, giraffe, singe)
Output: ('animalFunc', 'lion', 'tiger', 'giraffe', 'singe')

As you see the function name is always the first in the list and the rest are the paramaters names passed


OmarSSelim
  • 41
  • 3