1

I hope I can explain myself clearly here and that this is not too much of a specific issue.

I am working on some javascript that needs to take a string, find instances of chars between square brackets, store any returned results and then remove them from the original string.

My code so far is as follows:

parseLine : function(raw)
{
    var arr = [];

    var regex = /\[(.*?)]/g;
    var arr;
    while((arr = regex.exec(raw)) !== null)
    {
        console.log("  ", arr);
        arr.push(arr[1]);
        raw = raw.replace(/\[(.*?)]/, "");
        console.log("    ", raw);
    }

    return {results:arr, text:raw};
}

This seems to work in most cases. If I pass in the string [id1]It [someChar]found [a#]an [id2]excellent [aa]match then it returns all the chars from within the square brackets and the original string with the bracketed groups removed.

The problem arises when I use the string [id1]It [someChar]found [a#]a [aa]match.

It seems to fail when only a single letter (and space?) follows a bracketed group and starts missing groups as you can see in the log if you try it out. It also freaks out if i use groups back to back like [a][b] which I will need to do.

I'm guessing this is my RegEx - begged and borrowed from various posts here as I know nothing about it really - but I've had no luck fixing it and could use some help if anyone has any to offer. A fix would be great but more than that an explanation of what is actually going on behind the scenes would be awesome.

Thanks in advance all.

popClingwrap
  • 3,919
  • 5
  • 26
  • 44

3 Answers3

3

You could use the replace method with a function to simplify the code and run the regexp only once:

function parseLine(raw) {
  var results = [];
  var parsed = raw.replace(/\[(.*?)\]/g, function(match,capture) {
    results.push(capture);
    return '';
  });
  return { results : results, text : parsed };
}
Tibos
  • 27,507
  • 4
  • 50
  • 64
  • Still not working as I want but I like this approach, much tidier. Cheers. – popClingwrap Feb 10 '14 at 17:07
  • @popClingwrap: Why is it not working as you want? I think this is much cleaner and to the point. – nhahtdh Feb 10 '14 at 17:13
  • The approach is cleaner and maybe I didn't explain well - the code I posted is a simplification of what I'm actually using - but it only works as I want if I remove the 'g' from the regex. Which is why I marked the answer as I did. Kind of a joint effort I suppose and I really appreciate both answers. – popClingwrap Feb 10 '14 at 17:45
1

The problem is due to the lastIndex property of the regex /\[(.*?)]/g; not resetting, since the regex is declared as global. When the regex has global flag g on, lastIndex property of RegExp is used to mark the position to start the next attempt to search for a match, and it is expected that the same string is fed to the RegExp.exec() function (explicitly, or implicitly via RegExp.test() for example) until no more match can be found. Either that, or you reset the lastIndex to 0 before feeding in a new input.

Since your code is reassigning the variable raw on every loop, you are using the wrong lastIndex to attempt the next match.

The problem will be solved when you remove g flag from your regex. Or you could use the solution proposed by Tibos where you supply a function to String.replace() function to do replacement and extract the capturing group at the same time.

Community
  • 1
  • 1
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • That seems to have done the job :) One little 'g', who woulda thunk it. I should really sit down and read up on RegEx instead of just stealing bits from stackoverflow all the time. Cheers – popClingwrap Feb 10 '14 at 17:10
-1

You need to escape the last bracket: \[(.*?)\].

tenub
  • 3,386
  • 1
  • 16
  • 25