2

I have been struggling with creating a regular expression that will differentiate between an object definition and calling that object in Python. The purpose is syntax highlighting.

This is the situation which needs to be resolved: (numbers denote line)

0    def otherfunc(vars...):
1        pass
2
3    otherfunc(vars...)

I am interested in matching the name of the object, but not if preceded anywhere by def in the same line. The result on the above code should be:

"otherfunc", line: 3

Is regular expressions capable of doing something like this?

EDIT: I am only concerned with scanning/searching a single line at a time.

Richard
  • 2,994
  • 1
  • 19
  • 31
xitiru
  • 31
  • 3
  • 1
    Yes, to an extent... however if you want it to *not* match within multiline strings and such, then it becomes much harder... – Antti Haapala -- Слава Україні Oct 28 '16 at 10:54
  • @AnttiHaapala I am only concerned with scanning a single line at a time. I shall edit the question to reflect this. – xitiru Oct 28 '16 at 11:04
  • How would you determine if it is a function? A class is callable, and so are some other objects. Do you include `lambda`s in this? – cdarke Oct 28 '16 at 11:04
  • @cdarke That is a good point. It is imprecise to call it a function. I shall edit my question to reflect this. – xitiru Oct 28 '16 at 11:05
  • Does this really need to be implemented solely with a regular expression? If you're unfamiliar with regular expressions, sometimes a more straight-forward approach will do. First, search for the thing you're searching for with a simple regular expression, and then iterate over the results to throw out things you don't care about. – Bryan Oakley Oct 28 '16 at 12:35

2 Answers2

0

You could use negative lookbehind. This matches an atom that is not preceded by an atom. So in your case your looking for otherfunc which is not preceded by "def"

I'm use PCRE regex here.

(?<!def\s)otherfunc
Richard
  • 2,994
  • 1
  • 19
  • 31
0

I like Richards answer, however I would also take into considerarion the valid function name characters of phyton and intendation. So this is what I came up with:

(?<!(def\s))(?<=^|\s)[a-zA-Z_][\w_]*(?=\()

See this working sample on Rexex101

Explanation

Matches valid python function names if

  1. (?<!(def\s)) they are not following a def and a whitespace and
  2. (?<=^|\s) are either at the beginning of a line, or following a whitespace (this is the closest you get, since lookbehinds dont support wildcard specifiers) and
  3. are followed by a opening bracket (()

Note that I am not an phyton dev, so for the sake of simplicity [a-zA-Z_][\w_]* matches valid phyton 2.x function names, you can extend this part of the expression to phyton 3.x which I have no clue of ;)

Community
  • 1
  • 1
nozzleman
  • 9,529
  • 4
  • 37
  • 58