1

I'm still having a hard time to understand regex... :-/

Given strings (JavaScript-like expressions) like these...

  • foo[0]
  • foo[4][2]
  • foo[4][2][234523][3]

...I'm trying to deconstruct the indices in regex, so that I have

  • the name of the variable: foo
  • the single indices: fox example 4, 2, 234523 and 3 in the last example

while not accepting invalid syntax like

  • foo[23]bar[55]
  • foo[123]bar
  • [123]bla
  • foo[urrrr]

It would be nice to also ignore whitespace like foo [13] or foo[ 123 ] but that's not important.

Is that possible with regex?

I was able to extract the brackets with var matches = s.match(/\[([0-9]?)\]/g); but that includes the brackets in the result, is missing the variable name (could get around that) and also does not respect the edge cases as described above.

Udo G
  • 12,572
  • 13
  • 56
  • 89
  • What would be considered valid syntax? For example, is `foo[23] bar[55]` valid, or does there need to be a newline in between them, or does the closing bracket need to be the last character? – Reid Horton Aug 27 '16 at 14:34
  • `foo[23]` and `bar[55]` would be two independent examples. So, a single string `foo[23] bar[55]` would be invalid. `var foo = X;` must be valid syntax when `X` is the string to be parsed. I'm actually trying to decode a **very small** subset of the JavaScript language. – Udo G Aug 27 '16 at 14:37
  • You can't parse JS with regex... – Oriol Aug 27 '16 at 15:08

3 Answers3

2

You'll have to use loops to extract multiple matches. Here's one way:

function run(string) {
  var match;
  if(match = string.match(/^([^[]+)\s*(\[\s*(\d+)\s*\]\s*)+\s*$/)) {
    var variable = match[1], indices = [];
    var re = /\[\s*(\d+)\s*\]/g;
    while(match = re.exec(string)) {
      indices.push(+match[1]);
    }
    return { variable: variable, indices: indices };
  } else {
    return null;
  }
}

var strings = [
  "foo[0]",
  "foo[4][2]",
  "foo[4][2][234523][3]",
  "foo [13]",
  "foo[ 123 ]",
  "foo[1] [2]",
  "foo$;bar%[1]",
  // The following are invalid
  "foo[23]bar[55]",
  "foo[123]bar",
  "[123]bla",
  "foo[urrrr]",
];

// Demo
strings.forEach(function(string) {
  document.write("<pre>" + JSON.stringify(run(string), null, 4) + "</pre>");
});
Dogbert
  • 212,659
  • 41
  • 396
  • 397
  • @UdoG if you're trying to parse more complex expressions than this, you might want to look at PEG.js or another JS parser generator or combinator library. – Dogbert Aug 27 '16 at 14:40
  • Thanks, but I don't need more complex expressions. Since you also included support for whitespace: is it possible to allow a string like `foo[1] [2]` (whitespace between the brackets) ? – Udo G Aug 27 '16 at 14:42
  • Also, is it possible to accept *anything* before the first bracket, even `foo$;bar%´ - yes, I wasn't clear about that in my question, sorry – Udo G Aug 27 '16 at 14:44
1

That is not possible.

You can test if it is a correct statement, and as long you know how many indices you have you can select them, but there is no way to catch a group multiple times with javascript .exec.

However the language is regular. So it would be this:

^([a-zA-Z][a-zA-Z_0-9]*)(\[[0-9]*\])*

The first group will match the variable, and the second group (with the *quantifier 0-n times) the index.

So if you want to do this I recommend to use another parsing approach:

function parse(str) {
  let idx = 0;
  while(str[idx+1] != '[') {
    idx++;
  }

  let name = str.substr(0, idx+1);


  let indices = [];
  while(str[idx+1] == '[') {
    idx++;
    let startIdx = idx;
    while(str[idx+1] != ']') {
      idx ++;
    }
    console.log(idx);
    indices.push(str.substr(startIdx+1, idx-startIdx));
    idx++;
  }

  return {name,indices};
}
Community
  • 1
  • 1
Lux
  • 17,835
  • 5
  • 43
  • 73
0

Here is small ES6 version of the 2 step regular expression to get the desired array:

function interpret(s) {
    return (/^(\w+)\s*((?:\[\s*\d+\s*\]\s*)*)$/.exec(s) || [,null]).slice(1).reduce(
        (fun, args) => [fun].concat(args.match(/\d+/g))); 
}

var s = 'foo[4][2][234523][3]';
var result = interpret(s);
console.log(result);

It first gets the 2 main parts via exec(), which returns the complete match, the function name and the rest in an array (with 3 elements). Then with slice(1) it ignores the first of those three. The two others are passed to reduce.

The reduce callback will only be called once, since there is no initial value provided.

This is convenient, as it actually means the callback gets the two parts as its two arguments. It applies the second regular expression to split the index string, and returns the final array.

The || [,null] will take care of the case when the original match fails: it ensures that reduce acts on [null] and thus will return null.

trincot
  • 317,000
  • 35
  • 244
  • 286