2

I'm currently editing my javascript.lang file to highlight function names. Here is my expression for gtksourceview that I am currently using.

<define-regex id="function-regex" >
(?&lt;=([\.|\s]))
([a-z]\w*)
(?=([\(].*))(?=(.*[\)]))
</define-regex>

here's the regex by itself

(?<=([\.|\s]))([a-z]\w*)(?=([\(].*))(?=(.*[\)]))

It appears to work for situations such as, foo(A) which I am satisfied with. But where I am having trouble is if I want it to highlight a function name within the parentheses of another function call.

  foo(bar(A))

or to put it more rigorously

  foo{N}(foo{N-1}(...(foo{2}(foo{1}(A))...))

So with the example,

  foo(bar(baz(A)))

my goal is for it to highlight foo, bar, baz and nothing else.

I don't know how to handle the bar function. I have read about a way of doing regex recursively with (?R) or (?0) but I have not had any success using that to highlight functions recursively in gedit.

P.S. Here are the tests that I am currently using to determine success.

initialDrawGraph(toBeSorted);   
$(element).removeClass(currentclass);
myFrame.popStack();
context.outputCurrentSortOrder(V);
myFrame.nextFunction = sorter.Sort.;
context.outputToDivConsole(formatStr(V),1);
Cœur
  • 37,241
  • 25
  • 195
  • 267
joelliusp
  • 436
  • 8
  • 18

3 Answers3

1

Balancing parentheses is not a regular expression, since it needs memory (See: Can regular expressions be used to match nested patterns?). For some implementations, there is an implementation for recursion in regular expressions:

Matching Balanced Constructs

The main purpose of recursion is to match balanced constructs or nested constructs. The generic regex is b(?:m|(?R))*e where b is what begins the construct, m is what can occur in the middle of the construct, and e is what can occur at the end of the construct. For correct results, no two of b, m, and e should be able to match the same text. You can use an atomic group instead of the non-capturing group for improved performance: b(?>m|(?R))*e.

A common real-world use is to match a balanced set of parentheses. \((?>[^()]|(?R))*\) matches a single pair of parentheses with any text in between, including an unlimited number of parentheses, as long as they are all properly paired. If the subject string contains unbalanced parentheses, then the first regex match is the leftmost pair of balanced parentheses, which may occur after unbalanced opening parentheses. If you want a regex that does not find any matches in a string that contains unbalanced parentheses, then you need to use a subroutine call instead of recursion. If you want to find a sequence of multiple pairs of balanced parentheses as a single match, then you also need a subroutine call.

Community
  • 1
  • 1
Uri Agassi
  • 36,848
  • 14
  • 76
  • 93
1

Ok, looks like I was making this more complicated than it needed to be.

I was able to achieve what I needed with this simpler regex. I just told it to stop looking for the close parenthesis.

([a-zA-Z0-9][a-zA-Z0-9]*)(?=\()
joelliusp
  • 436
  • 8
  • 18
0

The following regex works for nested functions (Note: This is the python version of regex. You may or may not need to make some syntax tweaks. Hopefull, you'll get the idea):

[OBSOLETED] '(\w+\()+[^\)]*\)+'

[UPDATED] (Should Work. Hopefully)

(\w+\()+([^\)]*\)+)*

sshashank124
  • 31,495
  • 9
  • 67
  • 76
  • It looks like that regex has the problem where it simply highlights everything within the parentheses. – joelliusp Mar 30 '14 at 06:58
  • This solution breaks if you have anything _but_ `F1(F2(F3(...)))` - `F1(F2()+F3())` does not work for it – Uri Agassi Mar 30 '14 at 06:58
  • @joelliusp, Isn't that what you want. Or do you want to highlight only the function name or something? – sshashank124 Mar 30 '14 at 07:03
  • @joelliusp, I updated my answer, Workds for Uri's case as well now. – sshashank124 Mar 30 '14 at 07:31
  • @sshashank124 Correct. I am trying to highlight the function name only. I just edited my question to clear up that confusion hopefully. – joelliusp Mar 30 '14 at 12:56
  • 1
    @joelliusp, So in the case of `foo(bar(baz(A)))`, would you only want it to highlight `foo` or `foo`, `bar`, `baz` as well? – sshashank124 Mar 30 '14 at 13:02
  • @sshashank124 The second one is my goal. foo, bar, and baz should all be highlighted and should be the only things highlighted. – joelliusp Mar 30 '14 at 13:08
  • @joelliuspm, Sorry for so many questions, but how are you going to tell apart, `foo(bar(A))` and `"EU (European Union)"` and `5+(4-3)`. They all fit the function-regex definition. Don't they? – sshashank124 Mar 30 '14 at 13:18
  • @sshashank124 I don't mind the questions. Actually it looks like the regular expression that I included in the question, already handles "EU (European Union)" and 5+(4-3) It just doesn't handle the bar in foo(bar(A)) of course. – joelliusp Mar 30 '14 at 13:36