Are plain nested array of arrays guaranteed to be contiguous?

Question

I assume that nested std::array cannot assume that memory is strictly contiguous: see for instance: Is the data in nested std::arrays guaranteed to be contiguous? and other related questions. There might be some padding at the end of each sub-array. I cannot find the equivalent information about plain array. Is there a guarantee that all data within a plain array are contiguous in memory or not? for instance with:

uint8_t A[3][7];

Is it safe to say that A[1]-A[0] is 7?
subsidiary question: what should be the behaviour of the sizeof operator on a multidimensional array and on one of its subarray?

NB ISO/IEC JTC1 SC22 WG21 N4860 §9.3.3.4-9 states:

[Note: When several “array of” specifications are adjacent, a multidimensional array type is created; only the first of the constant expressions that specify the bounds of the arrays may be omitted. [Example: int x3d[3][5][7]; declares an array of three elements, each of which is an array of five elements, each of which is an array of seven integers. The overall array can be viewed as a three-dimensional array of integers, with rank 3 × 5 × 7. Any of the expressions x3d, x3d[i], x3d[i][j], x3d[i][j][k] can reasonably appear in an expression. The expression x3d[i] is equivalent to *(x3d + i); in that expression, x3d is subject to the array-to-pointer conversion (7.3.2) and is first converted to a pointer to a 2-dimensional array with rank 5 × 7 that points to the first element of x3d. Then i is added, which on typical implementations involves multiplying i by the length of the object to which the pointer points, which is sizeof(int)×5 × 7. The result of the addition and indirection is an lvalue denoting the ith array element of x3d (an array of five arrays of seven integers). If there is another subscript, the same argument applies again, so x3d[i][j] is an lvalue denoting the jth array element of the ith array element of x3d (an array of seven integers), and x3d[i][j][k] is an lvalue denoting the kth array element of the jth array element of the ith array element of x3d (an integer). —end example] The first subscript in the declaration helps determine the amount of storage consumed by an array but plays no other part in subscript calculations. —end note]

If I understand it correctly, the data should be contiguous but the "on typical implementations" part bothers me. Thus I still need confirmation. (In which case, I fail to understand why std::array would not have the same guarantee).

[EDIT] "contiguous" notion happens to be misleading in the case of nested sequence and I think that existing post on stackoverflow are not completly clear about that.
Thus my actual questions would be:

is the memory size only the sum of the size of all elements (no padding), thus here sizeof(A) would be sizeof(std::uint8_t*3*7)?
is A a contiguous sequence of std::uint8_t meaning that:

void foo(std::uint8_t);
std::uint8_t* p = &A([0][0]);
for (size_t i = 0;i < 3*7;++i,++p) {
   foo(*p);
}

are valid?
3. is an expression like std::memset(A,0,sizeof(std::uint8_t)*3*7) valid?

Some programmer dude · Answer 1 · 2023-08-01T12:56:44.190

3

sizeof A will be equal to 3 * 7 * sizeof(uint8_t).

sizeof A[0] will be equal to 7 * sizeof(uint8_t).

That is, the sizeof operator works as expected for any array.

Regarding A[1]-A[0], as both A[1] and A[0] are arrays (with the type uint8_t[7]), they will then decay to pointers to their first elements.

So the expression is really &A[1][0] - &A[0][0].

Because each sub-array is 7 elements, the result will be 7. Please note that the result would be 7 no matter the element type, because the result is in units of the element type, not in bytes.

And lastly to answer the question in the title:

[A]re plain nested array array guaranteed to be contiguous

Yes they are. There is no padding in arrays. It doesn't matter what the base type is, a plain integer type, a structure or class object, or anything else.

edited Aug 01 '23 at 12:56

answered Aug 01 '23 at 10:25

Some programmer dude

400,186
35
402
621

1

*"the difference between &A[1][0] and &A[0][0]"*. There are not pedantically in same array, so with C++ pointer arithmetic... ;-) – Jarod42 Aug 01 '23 at 11:24
1

regarding ```A[1]-A[0]``` [compiler explorer](https://godbolt.org/z/7TsvY3jh4) is showing otherwise. My interpretation: ```operator -``` is not defined for ```std::uint8_t[7]```, thus the compiler decays to ```std::uint8_t *``` and returns the distance between addresses, divided by ```sizeof(std::uint8_t)```. What do you think? – Oersted Aug 01 '23 at 12:29
@Oersted Yes you're correct. Stupid mistake by me there... :) I'll rewrite that part. – Some programmer dude Aug 01 '23 at 12:33
Yet it is contradictory with same ref as above, §7.6.6-5.2: _When two pointer expressions P and Q are subtracted,... (5.2) - Otherwise, if P and Q point to, respectively, array elements i and j of the same array object x, the expression P - Q has the value i − j_ I'm confused: experiment is giving 7 while standard says 1, as Someprogrammerdude... – Oersted Aug 01 '23 at 12:39
1

@Oersted: `A[1]` is not a pointer. `&A[1] - &A[0]` would give 1 [Demo](https://godbolt.org/z/d9b6errhq) (which also shows that `A[1]-A[0]` is pedantically UB). – Jarod42 Aug 01 '23 at 14:01
2

@Jarod42. OK, here is my interpretation (of clang output): ```A[x]``` is not a pointer, in order to be able to apply ```operator -``` then decay is required but then clang consider that they decay to the first element of each corresponding subarray, thus the issue with _"subtracted pointers are not elements of the same array"_ which leads to next line of the standard: _"Otherwise, the behavior is undefined"_ – Oersted Aug 01 '23 at 14:27

score 2 · Answer 2 · answered Aug 01 '23 at 17:47

According to my reading of the standard, A[1]-A[0] is undefined behaviour.

Pointer arithmetic is only defined when all pointers involved point to the same array, or one past the last element of the same array.

There are three arrays of interest: A, A[0], and A[1]. Let's see which of those are represented in the expression A[1]-A[0].

A[0] is an lvalue that refers to the first element of the array A. Since this element is array by itself, A[0] decays to the pointer to the first element of that array. It is equivalent to &A[0][0].

A[1] is an lvalue that refers to the second element of the array A. Since this element is array by itself, A[1] decays to the pointer to the first element of that array. It is equivalent to &A[1][0].

Thus, the expressions A[1] and A[0] have values that point to elements of two different arrays A[1] and A[0]. They cannot be used together as operands of a pointer arithmetic expression.

Wouldn't A[1] also point to the one-past-the-last element of A[0], that is, be the same as &A[0][7]? My answer is no, even though the hypothetical one-past-the-last element of A[0] has the same address as the first element of A[1]. &A[0][7] and &A[1][0] should compare equal, but there is nothing in the standard (that I can find) that guarantees that they are interchangeable in pointer arithmetic.

I agree, I will update my answer accordingly. I think it is important to first be clear about what "contiguous" actually means in C++ and I think that it should be expressed only in terms of pointer arithmetic, not memory address values. — Oersted, Aug 02 '23 at 07:32

Lucas Hendren · Answer 3 · 2023-08-02T09:03:44.847

1

Yes it should be contiguous and for the sudo code A[1] - A[0]= 7, but it would look like &A[1][0] and &A[0][0] as that would give its actual address location. is 7 as You can verify with your link to NB ISO/IEC JTC1 SC22 WG21 N4860 §9.3.3.4-9.

In C++ the standard is that a multidimensional array is laid out in memory in a contiguous block. When it refers "typical implementations", the reason it uses that term is because different data types have different sizes, so that would change the size and strides of it but it would still behave the same and be continuous.

The sizeof operator behaves consistently as well. When you apply sizeof to a multidimensional array, it gives you the total size occupied by the entire array and if you apply sizeof to one of its sub-arrays, it will give you the size of that sub-array.

For your examples it would be 3*7*sizeof(uint8_t) for your main array, and 7*sizeof(uint8_t) for your subarray.

Now as for why this is different than std::array. std::array can use user defined custom types that include padding. So it is not possible for std::array to guarantee it is always contiguous. That being said it is likely to be rare that it is not contiguous but you cant guarantee it, and if you use types like uint8_t it will behave as I described above.

Additions:

From ISO/IEC JTC1 SC22 WG21 N4860 §7.6.2.4-2, definition of the sizeof operator:

When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element

7.6.2.4-2 clearly preclude the possibility of padding within nested arrays.

Based on this definition for sizeof provided by 7.6.2.4-2, you'll end up getting the aggregate of the total size of all the subelements with no padding bytes

edited Aug 02 '23 at 09:03

answered Aug 01 '23 at 10:29

Lucas Hendren

2,786
2
18
33

Your paragraph about `std::array` is wrong... With C-array, we also might have `MyObj obj[2][2]`. `std::array` is allowed to have extra member/padding after the array. – Jarod42 Aug 01 '23 at 11:17
So I dont really touch on padding and I can add that in, but I dont know whats wrong with it based on your statement. Im saying custom arrays cant be guaranteed to be contiguous, that doesnt preclude padding, and specifically references ```MyObj``` potentially being different(as its a custom type). Did you mean to comment on the other answer? That specifically says no padding. – Lucas Hendren Aug 01 '23 at 11:20
I can add in an example of a custom type and describe it could have padding to be more specific, but yeah are you referring to my answer or the other one, because the other one is the one that says no padding. I somewhat left that out and can go back and add it in I do think i need to setup my math more correctly to be in the form of the other answer aka &A[1][0] but would rather confirm the above and do this in one edit – Lucas Hendren Aug 01 '23 at 11:27
`sizeof(std::array)` is not necessary `N`. `MyObj` refers to *"because it doesn't know the type that will be passed in."*. `MyObj obj[2][2]` is contiguous. – Jarod42 Aug 01 '23 at 11:30
I can redo the phrasing, but "because it doesn't know the type that will be passed in." is meant to be more like "The user can pass in custom types, thus we cant say anything definitely because of that" I see now it could be read as "we dont progamatically know whats being passed in" and can clarify? – Lucas Hendren Aug 01 '23 at 11:40
The fact that `T` can be anything has nothing to do with the fact that `std::array` might legally be implemented with something like `template struct array { T data[N]; char dummy; };`, the C-array `T data[N]` is contiguous, for any `T`. I suggest to read link in OP's question about `std::array` case. – Jarod42 Aug 01 '23 at 12:24
Sorry but I read and read again several times the standard note and I'm still bothered by the _"typical implementations"_ part. When looking at pointer arithmetic, it seems to deal with computing addresses of element inside an array but it does not implies that there is no padding after the last element. §9.3.3.4-9 wording remains ambiguous. Can pointer arithmetic be implemented in a different fashion, can the subobject size be different? I cannot find so far a wording saying that ```sizeof``` must behave like that, only that it may... – Oersted Aug 01 '23 at 12:50
edit regarding sizeof: 7.6.2.4-2: When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element. I'm still missing a piece of the puzzle but I think that this § and the fact that data are contiguous, inside an array are leading to the fact that pointer arithmetic cannot be something else that what is given in §9.3.3.4-9, thus settling the point. – Oersted Aug 01 '23 at 12:57
Comment to myself ;). 7.6.2.4-2 clearly preclude the possibility of padding within nested arrays. Answer authors may complete there answers with there reference in order to let me accept it. – Oersted Aug 01 '23 at 13:14
I am traveling at the moment but oersted let me know if that is what your lookign for, i added a brief inclusion of 7.6.2.4-2. I also made more clarifications around padding and std:: array to try to solve the issues Jarod42 brought up, this included a reference to previous stackoverflow post – Lucas Hendren Aug 02 '23 at 09:05
*"types like `uint8_t` it will behave as I described above"* To clarify, with `std::array, M>`, no guaranties to have a continuous block of `N*M` `uint8_t`. With `std::array`, guaranty to have a continuous block of 42 `T`. – Jarod42 Aug 02 '23 at 12:41

Oersted · Accepted Answer · 2023-08-02T08:30:42.670

I sum up here the various answers and comments regarding nested arrays such as

std::uint8_t A[3][7]

Relevant questions would be:

is the memory size only the sum of the size of all elements (no padding), thus here sizeof(A) would be sizeof(std::uint8_t*3*7)?
is A a contiguous sequence of std::uint8_t meaning that:

void foo(std::uint8_t);
std::uint8_t* p = &A([0][0]);
for (size_t i = 0;i < 3*7;++i,++p) {
   foo(*p);
}

is valid?
3. is an expression like std::memset(A,0,sizeof(std::uint8_t)*3*7) valid?

From ISO/IEC JTC1 SC22 WG21 N4860 §7.6.2.4-2, definition of the sizeof operator:

When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element

By immediate recursion, it leaves no other possibility than the fact that nested plain arrays size is the total size of its sub-elements (no padding bytes of any kind).
Yet it does not implies that it is contiguous with respect to the most nested type.

In order to understand that, I must give my own definition of "contiguous memory" as I failed to find an explicit definition within the standard. Thus my definition would be:
Saying that the memory locations of a sequence of objects are contiguous means that, from a pointer to the first element of the sequence, successively incrementing the pointer gives successive locations of the sequence objects, up to one past the last element.

Indeed from ISO/IEC JTC1 SC22 WG21 N4860 §7.6.6:

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P...
(4.2) - Otherwise, if P points to an array element i of an array object x with n elements (9.3.3.4),76 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 <= i + j <= n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 <= i − j <= n.
(4.3) — Otherwise, the behavior is undefined.

Besides:

When two pointer expressions P and Q are subtracted,...
(5.2) — Otherwise, if P and Q point to, respectively, array elements i and j of the same array object x, the expression P - Q has the value i − j. (5.3) — Otherwise, the behavior is undefined.

Yet A[x] is not a pointer but a std::uint8_t[7] that can be decayed to a pointer to its first element. Then, despite of the contiguous memory characteristic, A[1] and A[0] then decays to pointer to data in "semantically" different arrays. Thus as (4.3) is stating, it is undefined behavior to try to evaluate A[1]-A[0].
On the other hand &A[x] is the address of the xth element of A, A[1] and A[0] are elements of the same array A and, so, &A[1]-&A[0] is legal and evaluate to 1 (see 4.2).

Thus the answer to 2 is no: it is undefined behaviour.

Besides

Multidimensional arrays description in ISO/IEC JTC1 SC22 WG21 N4860 §9.3.3.4-9 is only, IMHO, slightly confusing. I just reproduced it here, for the record:

[Note: When several “array of” specifications are adjacent, a multidimensional array type is created; only the first of the constant expressions that specify the bounds of the arrays may be omitted. [Example: int x3d[3][5][7]; declares an array of three elements, each of which is an array of five elements, each of which is an array of seven integers. The overall array can be viewed as a three-dimensional array of integers, with rank 3 × 5 × 7. Any of the expressions x3d, x3d[i], x3d[i][j], x3d[i][j][k] can reasonably appear in an expression. The expression x3d[i] is equivalent to *(x3d + i); in that expression, x3d is subject to the array-to-pointer conversion (7.3.2) and is first converted to a pointer to a 2-dimensional array with rank 5 × 7 that points to the first element of x3d. Then i is added, which on typical implementations involves multiplying i by the length of the object to which the pointer points, which is sizeof(int)×5 × 7. The result of the addition and indirection is an lvalue denoting the ith array element of x3d (an array of five arrays of seven integers). If there is another subscript, the same argument applies again, so x3d[i][j] is an lvalue denoting the jth array element of the ith array element of x3d (an array of seven integers), and x3d[i][j][k] is an lvalue denoting the kth array element of the jth array element of the ith array element of x3d (an integer). —end example] The first subscript in the declaration helps determine the amount of storage consumed by an array but plays no other part in subscript calculations. —end note]

Regarding question 3, my temporary answer would be: it is valid, if the most nested type is trivially copyable.
From cppreference about std::memset, possible requirements are:

std::size_t count argument must be <= to the size in bytes of the array, which is the case.
the object pointed by void* dest is trivially copyable which is the case, by immediate recursion if the most nested type is trivially copyable.

Are plain nested array of arrays guaranteed to be contiguous?

4 Answers4