49

can anybody explaine the memory layout of

std::vector<std::array<int, 5>> vec(2)

does it provide contiguous memory block of a 2D array with 2 rows of 5 elements?

To my understanding, the vector of vectors

std::vector<std::vector<int>> vec(2, std::vector<int>(5))

provide the memory layout of two contiguous arrays of length 5 elements in different locations in memory.

Will it be the same for the vector of arrays?

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
Const
  • 1,306
  • 1
  • 10
  • 26
  • Given the answers, if you want this, use `std::vector vec(5*2)` and do 2D indexing yourself inside the flat 1D array. Maybe write a wrapper class for 2D indexing on top of a flat container, with either a templated or runtime-variable row length. You'd also want to expose a flat view so algorithms that just need to do something to every element without caring about 2D position can do that with one big loop, more efficiently. – Peter Cordes Jan 16 '19 at 00:17

4 Answers4

59

Arrays do not have any indirection, but just store their data "directly". That is, a std::array<int, 5> literally contains five ints in a row, flat. And, like vectors, they do not put padding between their elements, so they're "internally contiguous".

However, the std::array object itself may be larger than the set of its elements! It is permitted to have trailing "stuff" like padding. So, although likely, it is not necessarily true that your data will all be contiguous in the first case.

An int
+----+
|    |
+----+

A vector of 2 x int
+----+----+----+-----+        +----+----+
| housekeeping | ptr |        | 1  |  2 |
+----+----+----+-----+        +----+----+
                   |          ^
                   \-----------

An std::array<int, 5>
+----+----+----+----+----+----------->
| 1  |  2 |  3 |  4 |  5 | possible cruft/padding....
+----+----+----+----+----+----------->

A vector of 2 x std::array<int, 5>
+----+----+----+-----+        +----+----+----+----+----+----------------------------+----+----+----+----+----+----------->
| housekeeping | ptr |        | 1  |  2 |  3 |  4 |  5 | possible cruft/padding.... | 1  |  2 |  3 |  4 |  5 | possible cruft/padding....
+----+----+----+-----+        +----+----+----+----+----+----------------------------+----+----+----+----+----+----------->
                   |          ^
                   \-----------

And, even if it were, due to aliasing rules, whether you'd be able to use a single int* to navigate all 10 numbers would potentially be a different matter!

All in all, a vector of ten ints would be clearer, completely packed, and possibly safer to use.

In the case of a vector of vectors, a vector is really just a pointer plus some housekeeping, hence the indirection (as you say).

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 10
    According to the answers here data don't have to be contiguous: [Is the data in nested std::arrays guaranteed to be contiguous?](https://stackoverflow.com/q/9762662/580083). There are some discussion about this topic. Another discussions: [Does std::array of std::array have contiguous memory?](https://stackoverflow.com/q/47120994/580083) and [Is the size of std::array defined by standard](https://stackoverflow.com/q/19103244/580083). – Daniel Langr Jan 15 '19 at 10:57
  • 1
    IOW, while the memory allocated has to be contiguous, the array elements need not be. – MSalters Jan 15 '19 at 12:26
  • 1
    Ooh this answer just gets posher and posher. Upped to 13. – Bathsheba Jan 15 '19 at 13:18
  • 1
    @Bathsheba The diagram is appalling but oh well – Lightness Races in Orbit Jan 15 '19 at 13:18
  • 1
    @LightnessRacesinOrbit If you add a diagram for a vector of vectors, your answer will be "craved into stone" :) – Daniel Langr Jan 15 '19 at 17:49
  • @DanielLangr Bah maybe later – Lightness Races in Orbit Jan 15 '19 at 18:29
  • 3
    Note `static_assert(sizeof(std::array)==sizeof(int)*5)` mitigates any padding (and passes in every version of every major compiler that has supported `std::array`). It does not mitigate against aliasing issues. – Yakk - Adam Nevraumont Jan 15 '19 at 21:19
  • @Yakk-AdamNevraumont After much consideration I would _probably_ consider this sufficient, given how widely it will pass. But a vector of ints is still better. – Lightness Races in Orbit Jan 16 '19 at 00:41
19

The big difference between std::vector and std::array is that std::vector contains a pointer to the memory it wraps, while std::array contains the actual array in itself.

That means a vector of vectors is like a jagged array.

For a vector of arrays, the std::array objects will be placed contiguously but separate from the vector object. Note that the std::array object themselves may be larger than the array they contain, and if so then the data will not be contiguous.

The last bit also means that an array (plain C-style or std::array) of std::array may also not keep the data contiguously. The std::array objects in the array will be contiguous, but not the data.

The only way to guarantee contiguous data for a "multi-dimensional" array is nested plain C-style arrays.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
12

The C++ standard does not guarantee that std::array doesn't contain any payload at the end of the array, so alas you cannot assume that the first element of a subsequent array is just after the last element of a prior array.

Even if that were the case, the behaviour on attempting to reach any element in a array by pointer arithmetic on a pointer to an element in a different array is undefined. This is because pointer arithmetic is only valid within arrays.

The above also applies to a std::array<std::array>.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
6
static_assert(sizeof(std::array<int,5>)==5*sizeof(int));

the above mitigates against having any padding on the end of a std::array. No major compiler will cause the above to fail to this date, and I'd bet won't in the future.

If and only if the above fails, then std::vector<std::array<int,5>> v(2) will have a "gap" between the std::arrays.

This doesn't help as much as you'd like; a pointer generated as follows:

int* ptr = &v[0][0];

only has a domain of validity up to ptr+5, and dereferencing ptr+5 is undefined behavior.

This is due to aliasing rules; you aren't allowed to "walk" past the end of one object into another, even if you know it is there, unless you first round-trip to certain types (like char*) where less restricted pointer arithmetic is permitted.

That rule, in turn, exists to allow compilers to reason about what data is being accessed through which pointer, without having to prove that arbitrary pointer arithmetic will let you reach outside objects.

So:

struct bob {
  int x,y,z;
};

bob b {1,2,3};
int* py = &b.y;

no matter what you do with py as an int*, you cannot legally modify x or z with it.

*py = 77;
py[-1]=3;
std::cout << b.x;

the complier can optimize the std::cout line to simply print 1, because the py[-1]=3 may attempt to modify b.x, but doing so through that means is undefined behavior.

The same kind of restrictions prevent you from going from the first array in your std::vector to the second (ie, beyond ptr+4).

Creating ptr+5 is legal, but only as a one-past-the-end pointer. Comparing ptr+5 == &v[1][0] is also not specified in result, even though their binary values are absolutely going to be identical in every compiler on every major hardware system.

If you want to go futher down the rabbit hole, it isn't even possible to implement std::vector<int> within C++ itself due to these restrictions on pointer aliasing. Last I checked (which was before , but I didn't see a resolution in C++17) the standard committee was working on solving this, but I don't know the state of any such effort. (This is less of a problem than you might think, because nothing requires that std::vector<int> be implemented in standard-compliant C++; it must simply have standard-defined behavior. It can use compiler-specific extensions internally.)

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524