2

I've been trying to create a regex that matches function parameters.

Here's the full sample (with debug mode): https://regex101.com/r/vM7xN1/1

The regex I currently have is this one: \(([^,\s\)]+)(?:\s*,\s*([^,\s\)]+))*\)

And the match results I'm trying to achieve:

1. someFunc(param) => ['param']

2. someFunc(param, param2) => ['param', 'param2']

3. someFunc(param, param2, param3) => ['param', 'param2', 'param3']

4. someFunc(param, param2, param3, param4) => ['param', 'param2', 'param3', 'param4']

For some reason, this matches only 1, 2 functions correcly. And in functions 3 and 4 it will only match the first param and the last param.

Why is it skipping the params between the first and the last?

Edit: Additional tests:

'myFunction(param1, param2, param3)'.match(/(([^,\s)]+)(?:\s*,\s*([^,\s)]+))*)/) => ["(param1, param2, param3)", "param1", "param3"]

When trying without the non-capturing group I get this:

'myFunction(param1, param2, param3)'.match(/(([^,\s)]+)(\s*,\s*([^,\s)]+))*)/) => ["(param1, param2, param3)", "param1", ", param3", "param3"]

Any help would be great.

Thanks!

Imri Barr
  • 1,035
  • 3
  • 12
  • 20

3 Answers3

2

Just use a simple regex to get everything between the round brackets and split on comma:

>"someFunc(param, param2, param3, param4)".match(/(?:\()(.+)+(?:\))/)
["(param, param2, param3, param4)", "param, param2, param3, param4"]
>"someFunc(param, param2, param3, param4)".match(/(?:\()(.+)+(?:\))/)[1].split(/[\s,]+/)
["param", "param2", "param3", "param4"]

Edit:
We can also filter out empty elements(if any) like this:

>"someFunc( param , param2 , param3 )".match(/(?:\()(.+)+(?:\))/)[1].split(/[\s,]+/)
["", "param", "param2", "param3", ""]
>"someFunc( param , param2 , param3 )".match(/(?:\()(.+)+(?:\))/)[1].split(/[\s,]+/).filter(function(e){return e})
["param", "param2", "param3"]
riteshtch
  • 8,629
  • 4
  • 25
  • 38
  • You should include whitespace on either side of the comma in your `.split()` so the resulting matches are only the names without leading/trailing whitespace, – jfriend00 Mar 24 '16 at 07:56
  • Shouldn't it take any sequence of optional whitespace on either side of the comma? What you have works for the OP's specific test case, but not for the general syntax case such as `someFunc( param , param2 , param3 )`. – jfriend00 Mar 24 '16 at 08:05
  • I'm trying to avoid the splits for now. It's more of a regex challenge for me. – Imri Barr Mar 24 '16 at 08:19
  • @ImriBarr this gives an insight why only the last param is captured: http://stackoverflow.com/questions/3537878/how-to-capture-an-arbitrary-number-of-groups-in-javascript-regexp – riteshtch Mar 24 '16 at 08:35
1

It's not actually skipping anything.

Let's breakdown your regex and try to understand what each part is doing in case of third function. someFunc(param, param2, param3).

Regex: \(([^,\s\)]+)(?:\s*,\s*([^,\s\)]+))*\)

  • ([^,\s\)]+) is matching param and capturing it in group.

  • (?:\s*,\s*([^,\s\)]+)) matches all the , string instances which in your case is , param2, param3.

So the complete string matched is

(param , param2, param3)

But as you have used ([^,\s\)]+) the last sub-group captured is param3 which is also a part of ,param2, param3. Which is also visible in Regex101's Match Information.

  • @ImriBarr Your first **captured group** is `param` and second group is `, param2, param3` which is not completely captured but it's subgroups are captured. In the Match information **sub-groups** are shown. If you go in **Substitution** section you will see that `(param, param2, param3)` is actually matched. –  Mar 24 '16 at 08:22
  • 'myFunction(param1, param2, param3)'.match(/\(([^,\s\)]+)(?:\s*,\s*([^,\s\)]+))*\)/) => ["(param1, param2, param3)", "param1", "param3"] Edit: What is the substitution section? – Imri Barr Mar 24 '16 at 08:26
  • Notice that you have used `(?:)` for outer group which is _non-capturing_ which will be matched but won't be **stored** in memory for **backreferencing**.Within it you have used `()` which is _captured_ group and will be stored in memory for backreferencing. –  Mar 24 '16 at 08:27
  • @ImriBarr: The regex you have posted in comments is slightly different from the one you posted in question and it does makes a difference. I suggest you edit your question with actual code. –  Mar 24 '16 at 08:34
  • Sorry, this one is the correct one: 'myFunction(param1, param2, param3)'.match(/\(([^,\s\)]+)(?:\s*,\s*([^,\s\)]+))*\)/) => ["(param1, param2, param3)", "param1", "param3"] and without the non capturing group: 'myFunction(param1, param2, param3)'.match(/\(([^,\s\)]+)(\s*,\s*([^,\s\)]+))*\)/) => ["(param1, param2, param3)", "param1", ", param3", "param3"] – Imri Barr Mar 24 '16 at 08:42
  • @ImriBarr: I suggest you add that to your question so everyone can see that. –  Mar 24 '16 at 08:45
  • Edited my question – Imri Barr Mar 24 '16 at 08:48
1

As noob already explained, you are matching everything but your capture group stores only the last match. See http://www.rexegg.com/regex-capture.html#spawn_groups on generating new capturing groups for further information.

strippenzieher
  • 316
  • 2
  • 9