2

I am trying to remove a word from an array based on an index which doesn't exist in another array, but I am observing odd behavior when I use the splice and filter methods.

Can anyone explain the scenario below? Why is it happening like this in both cases, even though the same object is being altered on iteration?

Words

['one', 'two', 'three', 'four', 'five', 'six', 'seven']

Removable Words

['four', 'two', 'eight']

let words = ['one', 'two', 'three', 'four', 'five', 'six', 'seven'];
let removedWords = ['four', 'two', 'eight'];

words.forEach((word) => {
  console.log(word);
  if (removedWords.includes(word)) {
    words = words.filter((removableWord) => removableWord !== word)
  }
});

/* Output */
//one
//two
//three
//four
//five
//six
//seven

let words = ['one', 'two', 'three', 'four', 'five', 'six', 'seven'];
let removedWords = ['four', 'two', 'eight'];
words.forEach((word, index) => {
  console.log(word);
  if (removedWords.includes(word)) {
    words.splice(index, 1);
  }
});

/* Output */
//one
//two
//four
//six
//seven

As mentioned in this Mozila document forEach() does not make a copy of the array before iterating. Shouldn't it behave the same as splice after filtering and assigning back to the original object?

Note: Just to add on this, the splice method makes changes on original array and the filter method creates a new copy of the array and doesn't alter the original array, but in the given example (after filtering), the result is assigned back to the original array.

R. Richards
  • 24,603
  • 10
  • 64
  • 64
Dip Parmar
  • 312
  • 1
  • 13
  • 7
    Don't modify your array _while_ you're iterating over it. In fact, why use a `forEach` at all if you clearly already know `filter`? `const cleaned = words.filter(w => !removedWords.includes(w))` and done? – Mike 'Pomax' Kamermans Jul 26 '22 at 05:08
  • 1
    @Mike'Pomax'Kamermans Yes it makes sense and I agree, but can you please explain why different behavior when using the similar logic. What is happening at JS level that causing this behavior. – Dip Parmar Jul 26 '22 at 05:14
  • Once possible explanation `splice` mutates the array and `filter` don't – kiranvj Jul 26 '22 at 05:16
  • `What is happening at JS level` you're changing the length of the array. So, if it's the first iteration, you remove `[0]` ... on the next iteration the item will be from index `1` - which was index 2 ... so you can see the problem – Jaromanda X Jul 26 '22 at 05:17
  • @kiranvj yes that I already added it's changing original array with splice but we are assigning back after filtering too – Dip Parmar Jul 26 '22 at 05:17
  • Unlike with arrays, you can safely delete items from a [map](https://stackoverflow.com/questions/46149010/map-deletekey-during-map-foreach) during iteration. Yet, I've never tried resetting the value of the map itself during iteration; that would probably still cause problems. – Lonnie Best Jul 26 '22 at 05:22
  • @LonnieBest please cite your references why you think removing items from arrays is not safe. AFAIK, it is safe: see https://stackoverflow.com/a/35943995/1041641 – TmTron Jul 26 '22 at 05:53
  • @TmTron: This is from the same user of the post you sited: https://stackoverflow.com/a/56705849/217867 – Lonnie Best Jul 26 '22 at 17:01
  • 1
    @LonnieBest Ah, I see. It seems we just have a different definition of what "safe" means: for me "unsafe" means, that something causes a crash or errors (like memory corruption). Anyway I think we all agree that removing array items in a for-loop is a bad idea, because you may miss some items. – TmTron Jul 27 '22 at 06:25

2 Answers2

2

Your first example also works, but your console.log may confuse you. You log once for every word in the for loop before filtering.
Just log the result words after the loop to see that it works.

let words = ['one', 'two', 'three', 'four', 'five', 'six', 'seven'];
let removedWords = ['four', 'two', 'eight'];

words.forEach((word, index, iterationArray) => {
  console.log(index, word, iterationArray.length, words.length);
  if (removedWords.includes(word)) {
    words = words.filter((removableWord) => removableWord !== word)
  }
});

console.log(words);

Answer to the OPs comment:
So I guess, that you are confused by

"forEach() does not make a copy of the array before iterating"

  • it is true, that forEach() does not make a copy: but you make a copy inside the loop
  • at the start the variable words is a reference to the original array ['one', 'two', 'three', 'four', 'five', 'six', 'seven']
  • now you call words.forEach() which is a function that returns an iterator on this array (The iterator will always point to this original array, no matter if you change where the words reference points to later)
    • I've added the 3rd parameter iterationArray to forEach which is the array that the iterator uses
    • I've also added a console.log inside the loop: note, that iterationArray will not change, but words.length will change (because you assign new arrays to it)
  • in the loop you create a new array using words.filter (e.g. ['one', 'two', 'three', 'five', 'six', 'seven']) and change the words variable to point to this new array - BUT this does not change the iterator that you have already created before: i.e. the forEach loop still "points" to the original array

For the 2nd example:

let words = ['one', 'two', 'three', 'four', 'five', 'six', 'seven'];
let removedWords = ['four', 'two', 'eight'];
words.forEach((word, index, iterationArray) => {
  console.log(word, index, iterationArray.length);
  if (removedWords.includes(word)) {
    words.splice(index, 1);
  }
});
console.log(words);
  • again, forEach will not make a copy
  • the iterator will point to the original array
  • but now you change the original array, inside of your loop: i.e.
    • when the loop reaches the word "two" at index 1, you change the original array to ['one', 'three', 'four', 'five', 'six', 'seven']
    • in other words: you delete the item at index 1 and the other items are shifted to the left
  • now the iterator will continue and the next index is 2
    • since you have altered the original array, the value at index 2 is now four (and you missed the word trhee which is now at index 1 which the iterator has already processed
  • note, the console.log inside the loop: you can see that the iterationArray is changed
TmTron
  • 17,012
  • 10
  • 94
  • 142
  • You are missing a point here the question is about why different behavior on related things, as it's having non sequential data(four, two, eight) in removedWords, variable words output is same due to non sequential data after loop completes, if removable words have sequential data(two, three, four) then output will be different after loop completes. – Dip Parmar Jul 26 '22 at 06:06
  • I've added more details to my answer - I hope that can help you understand what's going on – TmTron Jul 26 '22 at 06:36
  • 1
    Yes, makes sense now.. – Dip Parmar Jul 26 '22 at 06:49
0

When you say let words = <an array> the reference to that array object (say ref1) is stored in the variable words. When you call forEach on that reference (ref1), it will keep referring to that reference perpetually.

Inside the loop, after filtering, you are getting a new filtered array which is a different array in memory. You may use the same words variable to hold the reference (ref2 / ref3), but this doesn't change the one on which forEach is acting on.

However, when you use splice, the original array edits itself.

Note: Not only that, you are producing 2 different filtered arrays successively with each call to filter.

['one', 'two', 'three', 'five', 'six', 'seven']
['one', 'three', 'five', 'six', 'seven']

Eventually, your first method works and produces the desired result, but you create 'X' copies of arrays if you have X items to be removed from words.

Your second method is better on performance because it doesn't keep producing copies of the array for each removal, but you have to think about which index you are removing, and how forEach will continue after an index is removed.

In either case, your console log is placed in the wrong place.

Teddy
  • 4,009
  • 2
  • 33
  • 55
  • *"Your second method is better on performance"*: This is an example of [premature optimzation](https://softwareengineering.stackexchange.com/questions/80084/is-premature-optimization-really-the-root-of-all-evil) that we should avoid. When you make performance claims, you should provide references to your sources and mention in which cases, and which environments (e.g. browsers, nodejs-versions, ..) that claim is true. E.g. the [ECMAScript Language Specification](https://262.ecma-international.org/5.1/#sec-15.4.4.12) states that `splice` will also create a new array. – TmTron Jul 26 '22 at 05:50
  • @TmTron I do agree I might have been wrong about splice. (Not able to open that ECMAScript reference which you provided.. it simply refuses to load). But, are you flatly saying we should not think one bit about performance until it is proven that there is user-perceivable impact on performance? – Teddy Jul 26 '22 at 07:18
  • OP's method 1 is guaranteed to produce as many new arrays as there are "matched" elements between words and removedWords. Method 2 with splice could be better depending on the environment.. I may have to do some digging if this is indeed true. Better way would be to construct a single new array, say `retainedWords`, which keeps only the needed words. – Teddy Jul 26 '22 at 07:22
  • While I agree premature optimization is bad, I don't believe it is right to give zero thought to creating dozens of collections where it is totally avoidable. I would avoid premature optimization when we do something unnatural or extra in the name of optimization. – Teddy Jul 26 '22 at 07:24
  • @TmTron From the ECMAScript link which you shared, I can see that the new array being created is an array of deleted items. So, in OP's case, each of those is a 1-item splice, so Method 2 will generate a few single-item arrays. Whereas Method 1 would create create an equal number of "all retained items" arrays, which is bound to be much bigger. – Teddy Jul 26 '22 at 07:28
  • My take on this is to use the simplest and easiest code to understand whenever possible. When we really hit a performance issue or when we are sure, that some places in the code are performance sensitive, then we will still implement the simplest code first, then create unit-tests, then create performance tests, then create optimized code and measure the difference between the simple and optimized version. Often the results are surprising and when there is only a negligible performance benefit we stick to the simple code. – TmTron Jul 26 '22 at 07:34
  • @TmTron I agree - I would avoid complicated logic in the name of performance. But, when you have 3 equally simple options, you can discard the ones which have obvious pitfalls. – Teddy Jul 26 '22 at 07:51
  • `splice` must not only create a new array for the deleted items, it must also shift existing items, and when you compare the algorithms [filter](https://262.ecma-international.org/5.1/#sec-15.4.4.20) is way easier. I agree that splice *could theoretically* be faster - but I would not take this for granted in every environment. – TmTron Jul 26 '22 at 07:57
  • @TmTron True. Splice could be faster, but I wouldn't rely on it, especially when there are other simple alternatives. That's what I meant by this "Better way would be to construct a single new array, say retainedWords, which keeps only the needed words.". Filter does exactly that if you just return false for the ones which you don't want to keep. – Teddy Jul 26 '22 at 08:01
  • I totally agree – TmTron Jul 26 '22 at 08:03