5

ES2020 contains a new String.prototype.matchAll method, which returns an iterator. I'm sure I'm missing something dumb/obvious, but I don't see why it doesn't just return an array instead.

Can someone please explain the logic there?

EDIT: Just to clarify something from the comments, I'm operating on the assumption that iterators haven't simply replaced arrays as the new way all JS APIs going forward will return multiple values. If I missed that memo, and all new JS functions do return iterators, a link to said memo would 100% qualify as a valid answer.

But again, I suspect that such a blanket change wasn't made, and that the makers of Javascript made a specific choice, for this specific method, to have it return an iterator ... and the logic of that choice is what I'm trying to understand.

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
machineghost
  • 33,529
  • 30
  • 159
  • 234
  • i guess it's an replacement of `exec`, which we used to use instead of matchAll to get the same functionality, since with `g` flag in match you won't be able to get all the matches as you can get in `exec` to mimic same functionality it is proposed. – Code Maniac Apr 12 '20 at 15:58
  • But `exec` returns an Array, *not* an iterator. From the MDN: "The exec() method executes a search for a match in a specified string. Returns a result array, or null." – machineghost Apr 12 '20 at 15:59
  • it's not like normal array it keeps track of lastindex of match and on next iteration it searches from there, the array holds value of current match and capture group. – Code Maniac Apr 12 '20 at 16:01
  • If the match succeeds, the exec() method returns an array (with extra properties index and input; see below) and updates the lastIndex property of the regular expression object. `MDN` – Code Maniac Apr 12 '20 at 16:04
  • 1
    If you return an array the complete result has to be known when the function call finished. Returning an iterator allows the evaluation of the next result at the time when it is requested. Depending on the use-case this can have benefits for memory and/or responsivnes. – t.niese Apr 12 '20 at 16:08
  • Isn't it to protect memory? An array has to be precomputed/allocated while iterator can be implemented lazily. This also means that if only few iterations are made the rest doesn't even need to be computed. – Wiktor Zychla Apr 12 '20 at 16:08
  • Just to focus things, I agree there are obviously pros/cons to iterables vs. arrays. I'm not questioning that. But at the same time look: iterables haven't replaced arrays in the JS language. The API is still *full* of things that return arrays. So my question isn't "why are iterables better/worse?", it's "why *in this specific method*, seemingly going against the trend of previous regex stuff, did they decide an iterator was what `matchAll` should return, instead of an array?" – machineghost Apr 12 '20 at 16:13
  • Protecting app from extensive memory usage is just safe. If there are means in the language to make apis safer why not use it? – Wiktor Zychla Apr 12 '20 at 16:17
  • By that logic, no new function should ever return an array now that iterators exist, because iterators are just safer arrays ... but again, iterators are *not* "arrays 2.0" in Javascript. The makers of JS did not just decide "all methods in ES versions after the iterator one will return iterators instead of arrays, because they are the new superior array" ... so saying (more or less) "iterators have clear advantages to arrays" (while 100% accurate) doesn't answer the question. – machineghost Apr 12 '20 at 16:23
  • @machineghost `going against the trend of previous regex stuff` just because old functions that were defined before iterators exist, does not mean that new functions should/must not utilize iterators. And changing `exec` to return an iterator is not possible. `no new function should ever return an array now that iterators exist`, for certain tasks you can estimate well how large the result will be, and how long it will take to calculate. For `matchAll` it depends on the regular expression and the input, so having a function that allows to not fully parse the result can indeed be helpful. – t.niese Apr 12 '20 at 16:29
  • :) Are you saying that old regex methods did return an array, but one with an iterator-like structure; they only didn't didn't use iterators because they didn't exist? Now that iterators do exist, they are specifically a good solution for this specific problem because of specific reasons, and because of those reasons iterator was the more natural option here? Because something like that (if a person were to outline those specifics) almost sounds like an answer ... – machineghost Apr 12 '20 at 16:34
  • Yes, that would be my guess for a reasonable explanation. But to know why they decided to do it that way would be a question to ask the committee members of the specification team ;) – t.niese Apr 12 '20 at 16:37
  • Not at all. If there's an obvious logical reason, *and it's explained well*, you 100% don't need a quote from a committee member to get your answer upvoted/accepted. – machineghost Apr 12 '20 at 16:38
  • I don't think this should be the answer. It's just guessing based on reasonable arguments. – Wiktor Zychla Apr 12 '20 at 20:58

1 Answers1

7

This is described in the proposal document:

Many use cases may want an array of matches - however, clearly not all will. Particularly large numbers of capturing groups, or large strings, might have performance implications to always gather all of them into an array. By returning an iterator, it can trivially be collected into an array with the spread operator or Array.from if the caller wishes to, but it need not.

.matchAll is lazy. When using the iterator, the regex will only evaluate the next match in the string once the prior match has been iterated over. This means that if the regex is expensive, the first few matches can be extracted, and then your JS logic can make the iterator bail out of trying further matches.

For a trivial example of the lazy evaluation in action:

for (const match of 'axxxxxxxxxxxxxxxxxxxxxxxxxxxxy'.matchAll(/a|(x+x+)+y./g)) {
  if (match[0] === 'a') {
    console.log('Breaking out');
    break;
  }
}
console.log('done');

Without the break, the regular expression will go on to attempt a 2nd match, which will result in a very expensive operation.

If matchAll returned an array, and iterated over all matches immediately while creating the array, it would not be possible to bail out.

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320