How can a c++ std::vector store objects of different size, and how can ++it know where to jump when it doesn't contain pointers

Question

EDIT: TLDR; I was a victim of object slicing, which I didn't know about. Now the original question follows.

I'm trying to understand how std::vector<MyClass> stores objects when an instance of MyDerived is push_backed into it. Also, how do iterators know where the start of the next memory block will be so that the increment ++ operator knows how to get there. Consider the following code sample:

#include <iostream>
#include <vector>
using namespace std;

class BaseShape
{
public:
    // BaseShape() { cout << "BaseShape() "; }

    virtual void draw() const { cout << "BASE?\n"; }
};

class Circle : public BaseShape
{
public:
    Circle() { cout << "Circle()"; }

    virtual void draw() const override { cout << "Circle!\n"; }

    void *somePointer, *ptr2;
};

class Triangle : public BaseShape
{
public:
    Triangle() { cout << "Triangle()"; }

    virtual void draw() const override { cout << "Triangle!\n"; }

    void *somePtr, *ptr2, *ptr3, *ptr4, *ptr5;
};

int main()
{

    cout << "vector<BaseShape *> ";
    vector<BaseShape *> pShapes{new BaseShape(), new Circle(), new Triangle(), new Circle()};
    cout << endl;

    for (vector<BaseShape *>::iterator it = pShapes.begin(); it != pShapes.end(); ++it)
    {
        cout << *it << " ";
        (*it)->draw();
    }

    // vector<BaseShape *> Circle()Triangle()Circle()
    // 01162F08 BASE?
    // 01162F18 Circle!
    // 011661A0 Triangle!
    // 01162F30 Circle!

    cout << "\nvector<BaseShape> ";
    vector<BaseShape> shapes{BaseShape(), Circle(), Triangle(), Circle()};
    cout << endl;

    for (vector<BaseShape>::iterator it = shapes.begin(); it != shapes.end(); ++it)
    {
        cout << &(*it) << " ";
        (*it).draw();
    }

    // vector<BaseShape> Circle()Triangle()Circle()
    // 01162FD0 BASE?
    // 01162FD4 BASE?
    // 01162FD8 BASE?
    // 01162FDC BASE?

    return 0;
}

In vector::<BaseShape*> pShapes, I understand that pShapes is only storing pointers to the address of the actual shape. Then, it is easy to know how much to increment the memory address with ++it, as all pointers will have the same memory size. Console output shows how *it jumps around in memory for "Triangle".

Now, my doubt comes when vector<BaseShape> shapes is used instead. Maybe my understanding is wrong, but I believe that shapes would store memory for BaseShape objects directly (more on this later). But if that is correct, then when I push_back a Circle or a Triangle object into it, how is it even possible to store all objects contiguously in memory? That doesn't sound possible, as Circle and Triangle have different sizes in memory, and their memory must be contiguous to that of the BaseShape object (e.g. [BaseShape mem][Circle mem]). Even more, how does ++it know exactly how much memory is needed to jump in order to get the next object? In the console output, I can see that ++it only increased the memory address by 4, which leads me to conclude that somehow only the BaseShape part was stored in memory. Is the [Circle mem] just dropped? Because I can see the Circle constructor was called (as seen in // vector<BaseShape> Circle()Triangle()Circle()).

I maybe was expecting the code to not compile or warn me that storing Circle or Triangle in shapes would lead to information loss, but it didn't and the code kinda worked. The 'kinda' is because draw() was early bound to BaseShape rather than properly late binding to Circle or Triangle as a virtual method should. This signals that shapes is storing contiguous BaseShape memory blocks...

I'm not trying to solve a problem here, I'm just curious about how C++ works and where is my misunderstanding of std::vector, pointers, or iterators.

But you can't push other `MyDerived` into a `MyClass` vector. Unless you create a specialized variadic type that can store your derived classes with something like a union (*something like std::any with SBO optimization*). Which would then be limited to a fixed number of types known at compile time. Basically the vector doesn't need to know the size of the derived classes because they simply cannot added to it without intentionally being dumb. — SLC, Feb 20 '22 at 14:16
@Salvage yes but the derived type is converted to the base type in the `push_back` function. So you're still pushing a base type in the vector. You simply may not know about it. — SLC, Feb 20 '22 at 14:20
Because you can't then `cout << my_vec.front().i` and expect valid behavior. edit: maybe have a cast there — SLC, Feb 20 '22 at 14:21
@SLC Ah, sure, that's true of course. It sounded to me like you were trying to imply that it wouldn't compile. So probably a misunderstanding on my part! Although to be honest, you don't have to be intentionally dumb to cause slicing, you just have to be familiar with languages where value and reference semantics use the same syntax (e.g. C#) where the pretty much equivalent syntax would be working as "expected". — Salvage, Feb 20 '22 at 14:24
@Salvage the `intentionally dumb` is for anyone who's new to this concept and may think they can outsmart the compiler (*spoiler alert. you don't!*). I was there and the results are never pretty. You just have to accept the reality instead of expecting c++ to work as other languages you may have used in the past. Is just a common pitfall. — SLC, Feb 20 '22 at 14:27

score 6 · Accepted Answer · answered Feb 20 '22 at 14:16

6

When storing BaseShapes in a vector by value you'll experience what is called object slicing.

Basically all information that only the derived classes contain is forgotten about, and only the base class' information is actually stored. All objects will behave as would BaseClass objects, with the only exception of potential class invariants being broken due to the slicing.

answered Feb 20 '22 at 14:16

Salvage

448
1
4
13

That makes a lot of sense. To see if I understood object slicing correctly, I'll try to explain what I now think happens: each time I construct an element of say `Circle`, their constructor gets called to generate a Circle [Base mem][Circle mem]. Then, when the Circle is `pushed` into the vector, only the [Base mem] is copied to a new object of type Base inside the vector, using object slicing. – nahuelarjonadev Feb 20 '22 at 17:21
1

@nahuelarjonadev Yep! – Salvage Feb 20 '22 at 17:29
I added cout statements for the copy constructors and destructors, and it all has now become crystal clear. I even learned an unexpected thing, which is that using {} initialization syntax apparently isn't smart enough to pre-allocate space in std::vector for the "compile-time" specified number of elements. A series of copy constructor and destructor calls happened for each element!! I thought using vector vec {a, b, c, d} syntax would lead to the compiler smartly pre-allocating 4 memory blocks prior to starting the copy process. I guess I know very little about c++ after all... – nahuelarjonadev Feb 20 '22 at 17:49
1

I think that behavior has more to do with the fact that the constructor, as called, creates a temporary [`std::initializer_list`](https://en.cppreference.com/w/cpp/utility/initializer_list), the elements of which are then copied into the vector, I don't think the vector itself is reallocating there. Also I'm fairly certain any half-decent optimizing compiler would elide those ctors, so maybe try compiling with `-O3` or similar. – Salvage Feb 20 '22 at 18:01
2

Try it out with `noexcept` move constructors too. Some code in `vector` uses the method `move_if_noexcept`. I've forgotten the exact details though. – user904963 Feb 20 '22 at 19:55
@user904963 Implicitly generated move constructors are `noexcept` by default (if all members are as well), so that shouldn't change anything here, but you're right in general of course :) – Salvage Feb 20 '22 at 19:58

How can a c++ std::vector store objects of different size, and how can ++it know where to jump when it doesn't contain pointers

1 Answers1