3

I need to find all function signatures accepting more than X arguments (say 2). I tried something like function\s*\((\w*,){3,10} (which would catch all signature with 3-10 args, but it did not work. Variations on it are yielding unexpected results. I guess I'm just not that good at regex, but any help is appreciated.

update: I should point out that I am writing a sort of code inspection tool. Among the many things, I want to spot functions that accept more than 2 arguments (as I promote the usage of functions with few arguments, and 1 argument in case of constructors). So I cannot call arguments.length etc.

Joe Minichino
  • 2,793
  • 20
  • 20
  • 1
    Functions have a `length` property... – elclanrs May 12 '15 at 20:08
  • Well *JavaScript* as most programming languages is not a *regular language*. As a result a regex cannot fully grasp the language. You need a *context-free grammar* to do this... – Willem Van Onsem May 12 '15 at 20:10
  • You don't need a CFG to get a reasonably good argument parser... I think you're just missing some optional whitespace there after the (\w*) group, and you need to make the comma optional as well (so, `(\w*,?)`) so that it matches the single-argument case. If you're looking for non-anonymous functions, you'll need to put a \w+ in there for the name, too. – a p May 12 '15 at 20:13
  • @elclanrs even if he wants to find the methods, does not mean that they belong to the current script. He could try to find certain methods in his php-files (java-doc-style) – dognose May 12 '15 at 20:15
  • you need to put the \w comma and space in a char group, and put that inside the paren, and have a terminating escaped paren – dandavis May 12 '15 at 20:16

3 Answers3

4

Just think "easy":

  • A method typically has (...): \(\)
  • A method with 3 parameters has 2 , inside the brackets: \(,{2,2}\)
  • each , NEEDS to be preceeded AND followed by strings: \((?:\w+,\w+){2,2}\)
  • no double matches occur, so does not work - let's make the leading string mandatory, the following optional, but finally it needs to stop with a string: \((?:\w+,\w*){2,2}\w+\)
  • usually a method declaration starts with function name: function\s+\w+\s*\((?:\w*,\w*){2,2}\)
  • finally, there could be whitespaces arround the paremeters: function\s+\w+\s*\((?:\s*\w+\s*,\s*\w*\s*){2,2}\w+\s*\)

There you go. This should cover all "common" method declarations, except nameless lambda-expressions:

function\s+\w+\s*\((?:\s*\w+\s*,\s*\w*\s*){2,2}\w+\s*\)

Regular expression visualization

Debuggex Demo

Matching two to two commas will find signatures with 3 parameters. Matchint two to five commas will find signatures with 3 upto 6 parameters.

dognose
  • 20,360
  • 9
  • 61
  • 107
  • One comment: this will allow `function test (foo, qux quux, test test2)` (thus parameters can be split in two words). Furthermore it doesn't match functions with 4 parameters. Given the js code is correct, that's not really a problem. But this is indeed a good solution. – Willem Van Onsem May 12 '15 at 20:53
  • This is a very clear answer, which I upvoted, but it just so happens that the answer from CommuSoft produced more positives. In light of the facts, I accepted his answer as correct, but thanks a lot for the pointers! – Joe Minichino May 12 '15 at 20:55
  • @JoeMinichino: I think by replacing `{2,2}` with `{2,}` and `\w+\s*` with `.*`, it will work correctly... – Willem Van Onsem May 12 '15 at 20:55
  • @CommuSoft for EXACTLY 4 parameters it would be `{3,3}` - for *two to four* `{1,3}`. You are right about the "false positives", but i assumed the javascript to be correct. – dognose May 12 '15 at 20:57
  • @CommuSoft Not offendet :-) Just wanted to outline that it's an *example* given, not the overall solution. – dognose May 12 '15 at 20:59
2

First of all, JavaScript is not a regular language, as a result, one cannot use a regex to fully grasp the language, and thus there is a possibility that you will either accept false positives, or false negatives.

A regex that probably comes close is:

function(?:\s+\w+)*\s*\(([^),]*)(\s*,\s*[^),]*){2,}\)

The regex works as follows:

  • function searches for the function keyword.
  • next there is an optional group \s+\w+ this group is used to identify with the name of the function: it is possible to define an anonymous function with no name, so the group must be optional.
  • Next \s*\( there is an arbitrary number of space and a bracket to open the parameter list;
  • Now between the brackets, we start looking for the parameters. To cover (most) cases, we will define a parameter as [^,)]* (a sequence of characters not containing a comma nor the closed bracket).
  • Now for the next parameters, we need to skip a comma, this is enforced by the \s*,\s* pattern (\s* is actually unnecessary). Next again a group for a parameter name and of course we need to skip at least two commas.
  • Finally, an (optional) closing bracket.
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • how would you get false negatives? i can see functions in strings/comments for a false positive, but functions themselves have to follow a specific grammer, one that can be defined by a regexp, right? – dandavis May 12 '15 at 20:18
  • 1
    well... he does not want to write a regex-based-javascript-parser - just find method signatures... – dognose May 12 '15 at 20:18
  • 1
    *Again*, the **language** *itself* _is not_ ***regular***, but that doesn't mean *that* a subsection of it, with constraints, are not regularly parseable. – a p May 12 '15 at 20:18
  • 1
    @ap: what is currently called regex is not a regular expression in a theorical meaning. – Casimir et Hippolyte May 12 '15 at 20:23
  • For instance if the function is part of a string (e.g.: `"function foobar (a,b,c,d)"`)... – Willem Van Onsem May 12 '15 at 20:23
  • @CasimiretHippolyte Absolutely true, and more reason that commusoft is wrong. I didn't want to add unnecessary confusion though ;) – a p May 12 '15 at 20:25
  • @CasimiretHippolyte: well Perl allows indeed to design context-free regexes and even even Turing complete ones. C# allows counting. But the Java regex engine still maps to the original regular expressions. – Willem Van Onsem May 12 '15 at 20:26
  • No because even if javascript (for example) regex doesn't have tools like recursion or balancing groups, the simple fact that it has lookaheads, backreferences make it able to describe not regular languages (not all languages however). – Casimir et Hippolyte May 12 '15 at 20:29
  • @CasimiretHippolyte: and as a false positive: `function foo (function bar (a,b,c)`: this is simply invalid JavaScript. But a regex engine fails to recall which environments are active. – Willem Van Onsem May 12 '15 at 20:30
  • 1
    @CasimiretHippolyte: see [here](http://stackoverflow.com/questions/2974210/does-lookaround-affect-which-languages-can-be-matched-by-regular-expressions): *regular languages are closed under lookahead*. Although it makes nice syntactical sugar, it doesn't *add* any power to regular expressions. The same for character groups, optional groups,... Indeed the strict definition of a regex for instance allows the Kleene star, but most non-Perl extensions don't add fundamental power. – Willem Van Onsem May 12 '15 at 20:34
  • You are right, my mistake, but what about the backreference. – Casimir et Hippolyte May 12 '15 at 20:35
  • @CasimiretHippolyte: A backreference is indeed not regular. The nice thing is that one can slightly adapt the definition of a DFA such that it is still evaluated in linear time however. – Willem Van Onsem May 12 '15 at 20:40
-1

You'd want to use function\s*\w+\s*\(\s*(\w+,?){3,10} to match non-anonymous (named) functions, and remove the \w+\s* to get function\s*\(\s*(\w+,?){3,10} for anonymous functions.

These can be combined to get function\s*(?:\w+\s*)?\(\s*(\w+,?){3,10} (the ?: is the non-capturing group)

a p
  • 3,098
  • 2
  • 24
  • 46
  • This will match `function(fobar)` as well since the comma is optional. – Willem Van Onsem May 12 '15 at 20:20
  • @CommuSoft you're wrong, it doesn't because of the minimum group{3,10} - it's incorrect for other reasons, but you seem to have missed them. – a p May 12 '15 at 20:22
  • https://regex101.com/r/hG0mS0/1. You repeat (\w+,?) three or more times, so for each group I select only a character (`f`, `o` and `o`), since the comma is optional **every** time. It is never selected. – Willem Van Onsem May 12 '15 at 20:24
  • And about those other reasons: you indeed forgot to allow spaces before and after the comma... – Willem Van Onsem May 12 '15 at 21:57
  • You're pretty mad about being wrong earlier, eh? It's cool, just let it go, man. – a p May 12 '15 at 22:21
  • not at all: (1) you post an answer with an error, when the error is noted (and here by two examples proven), you don't resolve the error. (2) In the discussion I've given two false positives. One can indeed be resolved but the case of `function x (function y (function z()))` which is simply invalid JavaScript syntax cannot be resolved (at least not easily), it will match the inner function, and that is not allowed... (3) Finally I indeed found that your regex contains additional errors. Please resolve these before people will use this regex... – Willem Van Onsem May 12 '15 at 22:27
  • Your claim about a "sublanguage" is correct in the sense that an intersection between a cfg and a dfa can indeed result in a dfa (for rare occasions). But here you want to *match* in other words find a substring. In that case there is no such cascade, since it will match invalid JavaScript as well. Since the script is invalid these aren't parameters, since the script doesn't make sense... – Willem Van Onsem May 12 '15 at 22:29
  • @CommuSoft you're hilarious, and I love it. Never change, bro. – a p May 12 '15 at 22:43
  • Of course not, but I hope your answer does ;). The question asks: match functions with more than two parameters, your regex matches the string [`function foo(bar)`](https://regex101.com/r/eS8tD3/1)... As long as this holds, unfortunately this answer is not correct and so the downvote cannot be removed ;p – Willem Van Onsem May 12 '15 at 22:44
  • *I'll* **fix** ***it*** when I **have** more *****time*****, thanks ;) – a p May 12 '15 at 22:45