1

I am writing a recursive algorithm to build a finite state automaton by parsing a regular expression. The automaton iterates through the expression, pushing characters to a stack and operators to an "operator stack." When I encounter "(" (indicating a grouping operation), I push a "sub automaton" to the stack and pass the rest of the pattern to the sub automaton to parse. When that automaton encounters ")", it passes the rest of the string up to the parent automaton to finish parsing. Here is the code:

var NFA = function(par) {
  this.stack = [];
  this.op_stack = [];
  this.parent = par;

};

NFA.prototype.parse = function(pattern) {
  var done = false;
  for(var i in pattern) {
    if (done === true) {
      break;
    }
    switch(pattern.charAt(i)) {
      case "(":
        var sub_nfa = new NFA(this);
        this.stack.push(sub_nfa);
        sub_nfa.parse(pattern.substring(i+1, pattern.length));
        done = true;
        break;
      case ")":
        if (this.parent !== null) {
          var len = pattern.length;
          /*TROUBLE SPOT*/
          this.parent.parse(pattern.substring(i, pattern.length));
          done = true;
          break;
        }
      case "*":
        this.op_stack.push(operator.KLEENE);
        break;
      case "|":
        this.op_stack.push(operator.UNION);
        break;
      default:
        if(this.stack.length > 0) {
          //only push concat after we see at least one symbol
          this.op_stack.push(operator.CONCAT);        
        }
        this.stack.push(pattern.charAt(i));
    }
  }
};

Note the area marked "TROUBLE SPOT". Given the regular expression "(a|b)a", the call this.parent.parse, is called exactly once: when the sub-automaton encounters ")". At this point, pattern.substring(i, pattern.length) = ")a". This "works", but it isn't correct because I need to consume the ")" input before I pass the string to the parent automaton. However, if I change the call to this.parent.parse(pattern.substring(i+1, pattern.length), parse gets handed the empty string! I have tried stepping through the code and I cannot explain this behavior. What am I missing?

At Juan's suggestion, I made a quick jsfiddle to show the problem when trying to parse "(a|b)a" with this algorithm. In the ")" case, it populates an empty div with the substring at the i index and the substring at the i+1 index. It shows that while there are 2 characters in the substring at i, the substring at i+1 is the empty string! Here's the link: http://jsfiddle.net/XC6QM/1/

EDIT: I edited this question to reflect the fact that using charAt(i) doesn't change the behavior of the algorithm.

Benjamin Malley
  • 233
  • 4
  • 14

2 Answers2

1

I think the previous answer was on the right track. But there also looks to me to be an off-by-one error. Shouldn't you be increasing the index for your substring? You don't want to include the ")" in the parent parse, right?

this.parent.parse(pattern.substring(i + 1, pattern.length));

But this will still fail because of the error Juan mentioned. A quick temporary fix to test this would be to convert the i to a number:

this.parent.parse(pattern.substring(+i + 1, pattern.length));

This might do it for you. But you should probably go back and switch away from the for-in loop on the string. I think that's causing your issue. Turn it into an array with str.split('') and then use an integer to loop. That will prevent further such issues.

Scott Sauyet
  • 49,207
  • 4
  • 49
  • 103
  • It's not an off by one error because the OP did realize they needed to increment the index. Except for that, you've diagnosed the problem correctly, see my answer for what I consider a better fix. – Ruan Mendes Feb 19 '13 at 19:35
  • I still believe there's an off-by-one in the code. But I guess that depends on the op_stack expected. – Scott Sauyet Feb 19 '13 at 22:07
  • There was an off by one error, but the OP had already realized it, they just couldn't figure out why `i+1` wasn't fixing it – Ruan Mendes Feb 20 '13 at 02:10
0

The real problem is the fact that you were using a for in to iterate through the characters of the string. With the for in loop, your i is going to be a string, therefore, when you try to do i+1, it does string concatenation.

If you change your iteration to be

for(var i=0; i < pattern.length; i++) {

Then it all works fine http://jsfiddle.net/XC6QM/2/

Scott's answer correctly identified the problem but I think his solution (converting the indexes to numbers) is not ideal. You're better off looping with a numeric index to begin with

Also, you should not use brackets to access characters within a string, that does not work in IE 7

 switch(pattern[i]) {

should be

 switch(pattern.charAt(i)) { 
Community
  • 1
  • 1
Ruan Mendes
  • 90,375
  • 31
  • 153
  • 217
  • This is a good tip, but it doesn't change the behavior of the algorithm. The "switch" portion of the code works fine, it's the substring behavior using the indices (not the characters) that seems to be failing. I'm editing my question to reflect this comment. – Benjamin Malley Feb 19 '13 at 18:55
  • Note that my suggestion was that converting the indices to numbers was very specifically a very temporary fix and that the ideal solution would be to use numbers from the start. – Scott Sauyet Feb 19 '13 at 22:04