Why does this happen at the end of the for loop with vectors c++

Question

I want to erase repeated elements in a vector; I used a for loop to check if the next element in the vector is the same as the current element in the iteration and then delete it if true but, for some reason, it deletes the last element without being equal.

Here's my code:

#include <string>
#include <vector>
#include <iostream>

using namespace std;

template <typename T> vector<T> uniqueInOrder(const vector<T>& iterable){
    vector<T> coolestVector = iterable;
    for (int i = 0; i < coolestVector.size(); i++)
    {
        if (coolestVector[i] == coolestVector[i+1]){
            coolestVector.erase(coolestVector.begin()+i);
            i--;
        }
        /*for (int i = 0; i < coolestVector.size(); i++)
        {
            cout<<coolestVector[i]<<", ";
        }
        cout<<i<<", ";
        cout<<coolestVector.size();
        cout<<endl;*/
    }

    for (int i = 0; i < coolestVector.size(); i++)
    {
        cout<<coolestVector[i]<<endl;
    }
    
    return coolestVector;
}
vector<char> uniqueInOrder(const string& iterable){
    vector<char> coolVector = {};
    for (int i = 0; i < iterable.size(); i++)
    {
        coolVector.push_back(iterable[i]);
    }
    const vector<char> realVector = coolVector;
    uniqueInOrder(realVector);
}

int main(){
    const string test = "AAAABBBCCDAABBB";
    uniqueInOrder(test);
}

output:

vector 0: A, A, A, B, B, B, C, C, D, A, A, B, B, B, iterator value -1, vector size 14
vector 0: A, A, B, B, B, C, C, D, A, A, B, B, B, iterator value -1, vector size 13
vector 0: A, B, B, B, C, C, D, A, A, B, B, B, iterator value -1, vector size 12
vector 1: A, B, B, B, C, C, D, A, A, B, B, B, iterator value 0, vector size 12
vector 1: A, B, B, C, C, D, A, A, B, B, B, iterator value 0, vector size 11
vector 1: A, B, C, C, D, A, A, B, B, B, iterator value 0, vector size 10
vector 2: A, B, C, C, D, A, A, B, B, B, iterator value 1, vector size 10
vector 2: A, B, C, D, A, A, B, B, B, iterator value 1, vector size 9
vector 3: A, B, C, D, A, A, B, B, B, iterator value 2, vector size 9
vector 4: A, B, C, D, A, A, B, B, B, iterator value 3, vector size 9
vector 4: A, B, C, D, A, B, B, B, iterator value 3, vector size 8
vector 5: A, B, C, D, A, B, B, B, iterator value 4, vector size 8
vector 5: A, B, C, D, A, B, B, iterator value 4, vector size 7
vector 5: A, B, C, D, A, B, iterator value 4, vector size 6
vector 5: A, B, C, D, A, iterator value 4, vector size 5
A
B
C
D
A

Expected:

A
B
C
D
A
B

You are aware that `coolestVector[(coolestVector.size()-1)+1]` invokes undefined behavior, right? ;) — JaMiT, Oct 07 '21 at 02:57
Does this answer your question? [Erasing from a std::vector while doing a for each?](https://stackoverflow.com/questions/3938838/erasing-from-a-stdvector-while-doing-a-for-each) — Ken Y-N, Oct 07 '21 at 03:09
Read about [`std::unique`](https://en.cppreference.com/w/cpp/algorithm/unique). — Pete Becker, Oct 07 '21 at 03:33
The diagnostic that you have commented out is a good approach. However, I would put it at the beginning of the loop instead of the end. That gives you a look at the data heading into your `if` statement, so you can better judge what `coolestVector[i] == coolestVector[i+1]` should evaluate to just before the last element is erased; that is, when `coolestVector` contains `A, B, C, D, A, B`, and `i` is `5`. (What are the sixth and seventh elements of this six-element array? Are they equal?) — JaMiT, Oct 09 '21 at 15:55

score 0 · Answer 1 · answered Oct 16 '21 at 20:55

Why is the code incorrect?

Many people learn to iterate over an array or vector by memorizing the setup

for (int i = 0; i < X.size(); i++)

This is good for basic loops, but there are times when it is inadequate. Do you know why the conditional is i < X.size()? A basic understanding would say that this conditional ensures that the loop body is executed a number of times equal to the size of X. This is not wrong, but that rationale is more applicable when i is not used inside the loop body. (As an example, that rationale would apply equally well if i started at 1 and the loop continued as long as i <= X.size(), yet that is not a good way to iterate over an array/vector.)

A deeper understanding looks at how i is used in the loop body. A common example is printing the elements of X. (This is preliminary; we'll return to the question's situation later.) A loop that prints the elements of X might look like the following:

for (int i = 0; i < X.size(); i++)
    std::cout << X[i] << ' ';

Note the index given to X – this is key to the loop's condition. The condition's deeper purpose is to ensure that the indices stay within the valid range. The indices given to X must not drop below 0 and they must remain less than X.size(). That is, index < X.size() where index gets replaced by whatever you have in the brackets. In this case, the thing in the brackets is i, so the condition becomes the familiar i < X.sixe().

Now let's look at the question's code.

for (int i = 0; i < coolestVector.size(); i++)
{
    if (coolestVector[i] == coolestVector[i+1]){
        // Code not using operator[]
    }
    // Diagnostics
}

There are two places where operator[] is used inside the loop. Apply the above "deeper understanding" to each of them, then combine the resulting conditionals with a logical "and".

The first index is i, so the goal index < X.size() becomes i < coolestVector.size() for this case.
The second index is i+1, so the goal index < X.size() becomes i+1 < coolestVector.size() for this case.

Combining these gives i < coolestVector.size() && i+1 < coolestVector.size(). This is what the loop's conditional should be to ensure that the indices stay within the valid range. Something logically equivalent would also work. Assuming that i+1 does not overflow (which would entail another class of problems), if i+1 is less than some value then so is i. It is enough to check that i+1 is in range, so we can simplify this conditional to i+1 < coolestVector.size().

for (int i = 0; i+1 < coolestVector.size(); i++)  // <--  Fixed!
{
    if (coolestVector[i] == coolestVector[i+1]){
        // Code not using operator[]
    }
    // Diagnostics
}

(I know, that was a lot of writing to say "add one". The point is to give you – and future readers – the tools to get the next loop correct.)

Note that the same principle applies to the start of the loop. We start i at 0 so that i >= 0. This happens to imply i+1 >= 0 as well, so in this case there is nothing extra to be done. However, if one of the used indices was i-1, then you would need to ensure i-1 >= 0, which would be done by starting i at 1.

Look at your indices to determine where your loop control variable should start and stop.

score 0 · Answer 2 · answered Oct 16 '21 at 21:04

I have separated this from my earlier answer because the earlier answer can stand on its own and I do not want it embroiled in the potential controversy that comes from explaining undefined behavior.

Why did the program consistently remove the last element?

Officially, we are in the realm of undefined behavior, so anything is possible. However, it is very likely that this behavior will be seen in all release builds, with two caveats.

An earlier element was removed. If no elements should be removed (a case worth adding to your test suite), then the behavior is unpredictable, possibly a crash but most likely the expected behavior.
Move construction leaves behind a copy. This is true for simple types like char. You likely will not see this behavior for a vector of std::string.

When an element in the middle of a std::vector is erased, all of the elements after that element are shifted down an index; they are copied (or moved) to the preceding element.

A B B C D
  ^
  |-- erase this

A B B C D
  ^ ^ ^    <--- shift and copy (or move)
  B C D

A B C D D
      ^
      |-- Last element in the vector

Note that space is not released upon erasing. The vector still owns the memory where D used to be; it's just that accessing the element from outside the vector implementation is undefined behavior. Also, that memory is unlikely to have its bits changed by the vector in a release build. So it is very likely that past the end of the vector is a copy of the last element of the vector, unless the move constructor changed it.

Now comes your condition. When i is coolestVector.size()-1, you check to see if the last element of the vector (coolestVector[i]) equals the element past the end of the vector (coolestVector[i+1]). A release build will not verify that the index is valid, and the operating system does not care if that location in memory is accessed, so this comparison is likely to go through as one might naively expect. Does the last element of the vector equal the thing from which it was copied? Yes! OK, delete the last element.

Very likely in a release build, but don't rely on it.

score -1 · Answer 3 · edited Oct 07 '21 at 06:02

-1

You can use std::set for unique elements that too in linear time O(1).

void Unique_Vector(vector<string>&v,int size)
{
   std::set<string>s;
   for(auto i : v)
   {
      s.insert(i);
   }
   std::cout<<"Vector after removing duplicate :";
   for(auto i : s)
   {
       std::cout<<i<<" ";
   }
}

edited Oct 07 '21 at 06:02

ouflak

2,458
10
44
49

answered Oct 07 '21 at 04:58

Mohsin Mehmood

80
9

*"in linear time O(1)"* -- O(1) represents constant time, not linear. Inserting into a set takes logarithmic time, and the insertion is done `v.size()` times, so the time complexity of this is neither "linear" nor "O(1)". Perhaps you meant to use an unordered set to net linear complexity on average? – JaMiT Oct 08 '21 at 01:41
Iterating over a set will not produce the expected result. The expected result preserves the original order, and duplicate values are allowed as long as they are not adjacent. – JaMiT Oct 08 '21 at 01:44

Why does this happen at the end of the for loop with vectors c++

3 Answers3

Why is the code incorrect?

Why did the program consistently remove the last element?

Linked