-2

If the RegExp is defined global and I use the vals array like I do, the result of exec will be the same everytime, so the code below is an infinity loop.

var regex = RegExp(/<(.*?)>.*?<\/\1>/, "g");

function readXml(xmlString) {
  var obj = {};
  var vals;
  for (var i = 0;
    (vals = regex.exec(xmlString)) !== null; i++) {
    if (!obj[vals[1]]) obj[vals[1]] = [];
    obj[vals[1]].push(readXml(vals[0].slice(vals[1].length + 1, -vals[1] - length - 3)));
  }
  if (i == 0) return xmlString;
  return obj;
}
console.log(readXml("<a>a</a><b>b</b>"));

If the RegExp is defined in function, the result of exec will be the next match everytime, so the code below logs a and b.

function readXml(xmlString) {
  var regex = RegExp(/<(.*?)>.*?<\/\1>/, "g");
  var obj = {};
  var vals;
  for (var i = 0;
    (vals = regex.exec(xmlString)) !== null; i++) {
    if (!obj[vals[1]]) obj[vals[1]] = [];
    obj[vals[1]].push(readXml(vals[0].slice(vals[1].length + 1, -vals[1] - length - 3)));
  }
  if (i == 0) return xmlString;
  return obj;
}
console.log(readXml("<a>a</a><b>b</b>"));

If I do something else with vals arrray in the loop, the result of exec will be the next match everytime, so the code below logs an empty object.

var regex = RegExp(/<(.*?)>.*?<\/\1>/, "g");

function readXml(xmlString) {
  var obj = {};
  var vals;
  for (var i = 0;
    (vals = regex.exec(xmlString)) !== null; i++) {
    vals = [2]
  }
  if (i == 0) return xmlString;
  return obj;
}
console.log(readXml("<a>a</a><b>b</b>"));

I think it should be an object with a and b in the first case too.

Why doesn't it just do the same thing in all cases?

TheBlueOne
  • 486
  • 5
  • 13
  • 2
    Just because *someone* has to say it: You **can't** correctly process XML with a naive regular expression like that. You need to use a parser. There's an XML parser built into the browser, and several available for other environments. There's no reason not to use one. – T.J. Crowder Oct 07 '19 at 10:32
  • The fundamental answer to *"Why doesn't it just do the same thing in all cases?"* is that regular expression objects **have state** (when you use the `g` flag): They remember where the last match was, and continue from that point. In some places above, you're reusing the same object. In other places, you're creating a new one each time. Hence the differences. – T.J. Crowder Oct 07 '19 at 10:33
  • Also note that `var regex = RegExp(/<(.*?)>.*?<\/\1>/, "g");` is an error-prone way to write `var regex = /<(.*?)>.*?<\/\1>/g;`, which would be the preferred way. – T.J. Crowder Oct 07 '19 at 10:34
  • i know that (there is a parser for xml), but just need to parse very simple xml without attributes and so on. And i couldnt find another example. – TheBlueOne Oct 07 '19 at 10:35
  • Everyone always thinks "Oh, but my example is simple and well-contained." It is until it isn't, and moreover, why waste your time? Use a parser. – T.J. Crowder Oct 07 '19 at 10:35
  • 1
    @T.J.Crowder just to add to the regex with the **g** flag being stateful - this function is recursive, so if the regex is in the global scope, then the state of the regex is shared between each invocation of `readXml`. In the second case, the state persists just for *each* execution of `readXml`, so when it recursively calls itself, a new regex keeping track of its own state is created. – VLAZ Oct 07 '19 at 10:41

1 Answers1

0

Global regexes (ie. using the g flag) keep track of the position of the last match in order to successfully get the next one the next time you call exec. It makes the assumption that you're passing the same input string each time, but if you go and change the input then it goes weird.

const regex = RegExp(/./,'g'); // should match each character, one at a time.

let input1 = 'test', match;
match = regex.exec(input1);
console.log(match[0]);

// now let's "re-use" the regex object...
let input2 = 'another test';
match = regex.exec(input2);
console.log(match[0]);

console.log("Where did the `a` go??");

You can access and modify the property responsible for this behaviour via regex.lastIndex, although in this case your best solution is to create a new RegExp object within the function, since you're working recursively.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • If this is what's wrong with the OP's code (I didn't go into the code deeply enough to decide), isn't this just a duplicate of [*Why does a RegExp with global flag give wrong results?*](https://stackoverflow.com/questions/1520800/why-does-a-regexp-with-global-flag-give-wrong-results)? – T.J. Crowder Oct 07 '19 at 10:36
  • @T.J.Crowder Yeah probably. Oops. – Niet the Dark Absol Oct 07 '19 at 10:37