290

As far as I know there is no such thing as named capturing groups in JavaScript. What is the alternative way to get similar functionality?

mmierins
  • 3,674
  • 4
  • 21
  • 25
  • 1
    Capture groups in javascript are by number ..$1 is the first captured group, $2, $3 ... up to $99 but it sounds like you want something else -- which doesn't exist – Erik Mar 20 '11 at 08:09
  • 26
    @Erik you're talking about _numbered_ capturing groups, the OP's talking about _named_ capturing groups. They exist, but we want to know if there's support for them in JS. – Alba Mendez Nov 09 '12 at 15:41
  • 5
    There's a [proposal to bring named regex into JavaScript](https://github.com/littledan/es-regexp-named-groups), but it might be years before we see that, if we ever do. – fregante Oct 11 '16 at 05:12
  • Firefox punished me for trying to use named capture groups on a website... my own fault really. https://stackoverflow.com/a/58221254/782034 – Nick Grealy Oct 03 '19 at 14:24

10 Answers10

236

ECMAScript 2018 introduces named capturing groups into JavaScript regexes.

Example:

  const auth = 'Bearer AUTHORIZATION_TOKEN'
  const { groups: { token } } = /Bearer (?<token>[^ $]*)/.exec(auth)
  console.log(token) // "AUTHORIZATION_TOKEN"

If you need to support older browsers, you can do everything with normal (numbered) capturing groups that you can do with named capturing groups, you just need to keep track of the numbers - which may be cumbersome if the order of capturing group in your regex changes.

There are only two "structural" advantages of named capturing groups I can think of:

  1. In some regex flavors (.NET and JGSoft, as far as I know), you can use the same name for different groups in your regex (see here for an example where this matters). But most regex flavors do not support this functionality anyway.

  2. If you need to refer to numbered capturing groups in a situation where they are surrounded by digits, you can get a problem. Let's say you want to add a zero to a digit and therefore want to replace (\d) with $10. In JavaScript, this will work (as long as you have fewer than 10 capturing group in your regex), but Perl will think you're looking for backreference number 10 instead of number 1, followed by a 0. In Perl, you can use ${1}0 in this case.

Other than that, named capturing groups are just "syntactic sugar". It helps to use capturing groups only when you really need them and to use non-capturing groups (?:...) in all other circumstances.

The bigger problem (in my opinion) with JavaScript is that it does not support verbose regexes which would make the creation of readable, complex regular expressions a lot easier.

Steve Levithan's XRegExp library solves these problems.

Roko C. Buljan
  • 196,159
  • 39
  • 305
  • 313
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 5
    Many flavors allow using the same capturing group name multiple times in a regex. But only .NET and Perl 5.10+ make this especially useful by keeping the value captured by the last group of a name that participated in the match. – slevithan Jun 01 '12 at 03:36
  • 114
    The huge advantage is: you can just change your RegExp, no number-to-variable mapping. Non-capturing groups solve this problem, except on one case: **what if the order of the groups changes?** Also, it's annonying to put this extra chars on the other groups... – Alba Mendez Nov 09 '12 at 15:45
  • 65
    The so called **syntactic sugar** _does_ help sweeten the code readability! – Mrchief Jul 31 '13 at 18:32
  • It will fill the unmatched ones with undefined. Some languages wont, forcing you to used named. – jgmjgm Sep 10 '15 at 01:37
  • 2
    I think there is another reason for named capturing groups that is really valuable. For example, if you want to use a regex to parse a date from a string, you could write a flexible function that takes the value and the regex. As long as the regex has named captures for the year, month and date you could run through an array of regular expressions with minimal code. – Dewey Vozel Jan 18 '16 at 20:33
  • @DeweyVozel chances are, testing an array of regexes on dates is not going to be good practice because 01/02/03, or even 01/02/2003 is ambiguous, but even then, not having to know the order saves code. Some of us are even generating our regexes from strings. – Adam Leggett Jan 09 '19 at 15:37
  • 4
    As of October 2019, Firefox, IE 11 and Microsoft Edge (pre-Chromium) do not support named group captures. Most other browsers (even Opera and Samsung mobile) do. https://caniuse.com/#feat=mdn-javascript_builtins_regexp_named_capture_groups – JDB Oct 02 '19 at 17:02
  • 1
    Firefox doesn't support this yet. Wait for this to be resolved https://bugzilla.mozilla.org/show_bug.cgi?id=1362154 – fregante Nov 26 '19 at 11:37
  • Firefox issue 1362154 was marked as resolved 2020-05-22. Target milestone at the time of this writing is mozilla78. – ManicDee May 29 '20 at 00:47
  • 1
    `const token = /Bearer (?[^ $]*)/.exec(auth).groups.token` is more readable – run_the_race Feb 05 '22 at 21:04
71

Another possible solution: create an object containing the group names and indexes.

var regex = new RegExp("(.*) (.*)");
var regexGroups = { FirstName: 1, LastName: 2 };

Then, use the object keys to reference the groups:

var m = regex.exec("John Smith");
var f = m[regexGroups.FirstName];

This improves the readability/quality of the code using the results of the regex, but not the readability of the regex itself.

Mr. TA
  • 5,230
  • 1
  • 28
  • 35
66

In ES6 you can use array destructuring to catch your groups:

let text = '27 months';
let regex = /(\d+)\s*(days?|months?|years?)/;
let [, count, unit] = regex.exec(text) || [];

// count === '27'
// unit === 'months'

Notice:

  • the first comma in the last let skips the first value of the resulting array, which is the whole matched string
  • the || [] after .exec() will prevent a destructuring error when there are no matches (because .exec() will return null)
fregante
  • 29,050
  • 14
  • 119
  • 159
  • 1
    The first comma is because the first element of the array returned by match is the input expression, right? – Emilio Grisolía Jul 31 '16 at 01:04
  • 1
    `String.prototype.match` returns an array with: the whole matched string at position 0, then any groups after that. The first comma says "skip the element at position 0" – fregante Jul 31 '16 at 06:34
  • 2
    My favorite answer here for those with transpiling or ES6+ targets. This doesn't necessarily prevent inconsistency errors as well as named indices could if e.g. a reused regex changes, but I think the conciseness here easily makes up for that. I've opted for `RegExp.prototype.exec` over `String.prototype.match` in places where the string may be `null` or `undefined`. – Mike Hill Jul 31 '17 at 15:29
63

You can use XRegExp, an augmented, extensible, cross-browser implementation of regular expressions, including support for additional syntax, flags, and methods:

  • Adds new regex and replacement text syntax, including comprehensive support for named capture.
  • Adds two new regex flags: s, to make dot match all characters (aka dotall or singleline mode), and x, for free-spacing and comments (aka extended mode).
  • Provides a suite of functions and methods that make complex regex processing a breeze.
  • Automagically fixes the most commonly encountered cross-browser inconsistencies in regex behavior and syntax.
  • Lets you easily create and use plugins that add new syntax and flags to XRegExp's regular expression language.
Barett
  • 5,826
  • 6
  • 51
  • 55
Yunga Palatino
  • 731
  • 5
  • 4
26

Update: It finally made it into JavaScript (ECMAScript 2018)!


Named capturing groups could make it into JavaScript very soon.
The proposal for it is at stage 3 already.

A capture group can be given a name inside angular brackets using the (?<name>...) syntax, for any identifier name. The regular expression for a date then can be written as /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u. Each name should be unique and follow the grammar for ECMAScript IdentifierName.

Named groups can be accessed from properties of a groups property of the regular expression result. Numbered references to the groups are also created, just as for non-named groups. For example:

let re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;
let result = re.exec('2015-01-02');
// result.groups.year === '2015';
// result.groups.month === '01';
// result.groups.day === '02';

// result[0] === '2015-01-02';
// result[1] === '2015';
// result[2] === '01';
// result[3] === '02';
Forivin
  • 14,780
  • 27
  • 106
  • 199
  • It's a stage 4 proposal at this time. – GOTO 0 May 24 '19 at 20:07
  • 1
    if youre using '18, might as well go all in with destructuring; `let {year, month, day} = ((result) => ((result) ? result.groups : {}))(re.exec('2015-01-02'));` – Hashbrown Jan 31 '20 at 07:59
  • 1
    Might as well go all in with null-coalescing as well (in case named capturing groups ever work): `let {year, month, day} = {...re.exec('2015-01-02')?.groups};` – Robert Sep 11 '20 at 22:11
10

As Tim Pietzcker said ECMAScript 2018 introduces named capturing groups into JavaScript regexes. But what I did not find in the above answers was how to use the named captured group in the regex itself.

you can use named captured group with this syntax: \k<name>. for example

var regexObj = /(?<year>\d{4})-(?<day>\d{2})-(?<month>\d{2}) year is \k<year>/

and as Forivin said you can use captured group in object result as follow:

let result = regexObj.exec('2019-28-06 year is 2019');
// result.groups.year === '2019';
// result.groups.month === '06';
// result.groups.day === '28';

  var regexObj = /(?<year>\d{4})-(?<day>\d{2})-(?<month>\d{2}) year is \k<year>/mgi;

function check(){
    var inp = document.getElementById("tinput").value;
    let result = regexObj.exec(inp);
    document.getElementById("year").innerHTML = result.groups.year;
    document.getElementById("month").innerHTML = result.groups.month;
    document.getElementById("day").innerHTML = result.groups.day;
}
td, th{
  border: solid 2px #ccc;
}
<input id="tinput" type="text" value="2019-28-06 year is 2019"/>
<br/>
<br/>
<span>Pattern: "(?<year>\d{4})-(?<day>\d{2})-(?<month>\d{2}) year is \k<year>";
<br/>
<br/>
<button onclick="check()">Check!</button>
<br/>
<br/>
<table>
  <thead>
    <tr>
      <th>
        <span>Year</span>
      </th>
      <th>
        <span>Month</span>
      </th>
      <th>
        <span>Day</span>
      </th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>
        <span id="year"></span>
      </td>
      <td>
        <span id="month"></span>
      </td>
      <td>
        <span id="day"></span>
      </td>
    </tr>
  </tbody>
</table>
Hamed Mahdizadeh
  • 936
  • 1
  • 15
  • 29
6

Naming captured groups provide one thing: less confusion with complex regular expressions.

It really depends on your use-case but maybe pretty-printing your regex could help.

Or you could try and define constants to refer to your captured groups.

Comments might then also help to show others who read your code, what you have done.

For the rest I must agree with Tims answer.

Yashima
  • 2,248
  • 2
  • 23
  • 39
5

There is a node.js library called named-regexp that you could use in your node.js projects (on in the browser by packaging the library with browserify or other packaging scripts). However, the library cannot be used with regular expressions that contain non-named capturing groups.

If you count the opening capturing braces in your regular expression you can create a mapping between named capturing groups and the numbered capturing groups in your regex and can mix and match freely. You just have to remove the group names before using the regex. I've written three functions that demonstrate that. See this gist: https://gist.github.com/gbirke/2cc2370135b665eee3ef

chiborg
  • 26,978
  • 14
  • 97
  • 115
3

Don't have ECMAScript 2018?

My goal was to make it work as similar as possible to what we are used to with named groups. Whereas in ECMAScript 2018 you can place ?<groupname> inside the group to indicate a named group, in my solution for older javascript, you can place (?!=<groupname>) inside the group to do the same thing. So it's an extra set of parenthesis and an extra !=. Pretty close!

I wrapped all of it into a string prototype function

Features

  • works with older javascript
  • no extra code
  • pretty simple to use
  • Regex still works
  • groups are documented within the regex itself
  • group names can have spaces
  • returns object with results

Instructions

  • place (?!={groupname}) inside each group you want to name
  • remember to eliminate any non-capturing groups () by putting ?: at the beginning of that group. These won't be named.

arrays.js

// @@pattern - includes injections of (?!={groupname}) for each group
// @@returns - an object with a property for each group having the group's match as the value 
String.prototype.matchWithGroups = function (pattern) {
  var matches = this.match(pattern);
  return pattern
  // get the pattern as a string
  .toString()
  // suss out the groups
  .match(/<(.+?)>/g)
  // remove the braces
  .map(function(group) {
    return group.match(/<(.+)>/)[1];
  })
  // create an object with a property for each group having the group's match as the value 
  .reduce(function(acc, curr, index, arr) {
    acc[curr] = matches[index + 1];
    return acc;
  }, {});
};    

usage

function testRegGroups() {
  var s = '123 Main St';
  var pattern = /((?!=<house number>)\d+)\s((?!=<street name>)\w+)\s((?!=<street type>)\w+)/;
  var o = s.matchWithGroups(pattern); // {'house number':"123", 'street name':"Main", 'street type':"St"}
  var j = JSON.stringify(o);
  var housenum = o['house number']; // 123
}

result of o

{
  "house number": "123",
  "street name": "Main",
  "street type": "St"
}
toddmo
  • 20,682
  • 14
  • 97
  • 107
  • Leaving an obligatory “modifying the prototype of the global String class in JavaScript is a very bad idea,” even if the result is pretty cool. – brainkim Jun 09 '22 at 17:37
2

While you can't do this with vanilla JavaScript, maybe you can use some Array.prototype function like Array.prototype.reduce to turn indexed matches into named ones using some magic.

Obviously, the following solution will need that matches occur in order:

// @text Contains the text to match
// @regex A regular expression object (f.e. /.+/)
// @matchNames An array of literal strings where each item
//             is the name of each group
function namedRegexMatch(text, regex, matchNames) {
  var matches = regex.exec(text);

  return matches.reduce(function(result, match, index) {
    if (index > 0)
      // This substraction is required because we count 
      // match indexes from 1, because 0 is the entire matched string
      result[matchNames[index - 1]] = match;

    return result;
  }, {});
}

var myString = "Hello Alex, I am John";

var namedMatches = namedRegexMatch(
  myString,
  /Hello ([a-z]+), I am ([a-z]+)/i, 
  ["firstPersonName", "secondPersonName"]
);

alert(JSON.stringify(namedMatches));
Matías Fidemraizer
  • 63,804
  • 18
  • 124
  • 206
  • That's pretty cool. I'm just thinking.. wouldn't it be possible to create a regex function that accepts a custom regex? So that you could go like `var assocArray = Regex("hello alex, I am dennis", "hello ({hisName}.+), I am ({yourName}.+)");` – Forivin Aug 29 '15 at 16:46
  • @Forivin Clearly you can go further and develop this feature. It wouldn't be hard to get it working :D – Matías Fidemraizer Aug 29 '15 at 19:38
  • You can extend the `RegExp` object by adding a function to its prototype. – Mr. TA Feb 16 '16 at 19:27
  • @Mr.TA AFAIK, it's not recommended to extend built-in objects – Matías Fidemraizer Feb 16 '16 at 19:50