Is copying 2D arrays with "memcpy" technically undefined behaviour?

Question

An interesting discussion has arisen in the comments to this recent question: Now, although the language there is C, the discussion has drifted to what the C++ Standard specifies, in terms of what constitutes undefined behaviour when accessing the elements of a multidimensional array using a function like std::memcpy.

First, here's the code from that question, converted to C++ and using const wherever possible:

#include <iostream>
#include <cstring>

void print(const int arr[][3], int n)
{
    for (int r = 0; r < 3; ++r) {
        for (int c = 0; c < n; ++c) {
            std::cout << arr[r][c] << " ";
        }
        std::cout << std::endl;
    }
}

int main()
{
    const int arr[3][3] = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9} };
    int arr_copy[3][3];
    print(arr, 3);
    std::memcpy(arr_copy, arr, sizeof arr);
    print(arr_copy, 3);
    return 0;
}

The issue is in the call to std::memcpy: the arr argument will yield (by decay) a pointer to the first int[3] subarray so, according to one side of the discussion (led by Ted Lyngmo), when the memcpy function accesses data beyond the third element of that subarray, there is formally undefined behaviour (and the same would apply to the destination, arr_copy).

However, the other side of the debate (to which mediocrevegetable1 and I subscribe) uses the rationale that each of the 2D arrays will, by definition, occupy continuous memory and, as the arguments to memcpy are just void* pointers to those locations (and the third, size argument is valid), then there cannot be UB here.

Here's a summary of some of the comments most pertinent to the debate, in case any "clean-up" occurs on the original question (bolding for emphasis mine):

I don't think there's any out-of-bounds here. Just like memcpy works for an array of ints, it works for an array of int [3]s, and both should be contiguous (but I'm not 100% sure). – mediocrevegetable1

The out of bounds access happens when you copy the first byte from arr[0][3]. I've never seen it actually fail, but, in C++, it has UB. – Ted Lyngmo

But the memcpy function/call doesn't do any array indexing - it's just given two void* pointers and copies memory from one to the other. – Adrian Mole

I can't say for sure if that matters in C. In C++ it doesn't. You get a pointer to the first int[3] and any access out of its range has UB. I haven't found any exception to that in the C++ standard. – Ted Lyngmo

I don't think the arr[0][3] thing applies. By that logic, I think copying the second int of an int array through memcpy would be UB as well. int [3] is simply the type of arr's elements, and the bounds of arr as a whole in bytes should be sizeof (int [3]) * 3. I'm probably missing something though :/ – mediocrevegetable1

Are there any C++ Language-Lawyers who can settle the matter – preferably with (an) appropriate citation(s) from the C++ Standard?

Also, relevant citations from the C Standard may be helpful – especially if the two language Standards differ – so I've included the C tag in this question.

Somewhat related C question, which cites the C standard, but is not specific to `memcpy`: [Cast T[\][\] to T*](https://stackoverflow.com/q/43851470/12149471) — Andreas Wenzel, Sep 25 '21 at 20:54
I'm sure there are dups about `memcpy` beyond the object bounds. (_C11 7.24.1 String function conventions p.1 «[If an array \[of characters comprising an object\] is accessed beyond the end of an object, the behavior is undefined.](http://port70.net/~nsz/c/c11/n1570.html#7.24.1p1)»_) — Language Lawyer, Sep 25 '21 at 21:02
I would think that library functions, being part of the implementation, would be exempt from some of the rules regarding how to access objects. — dbush, Sep 25 '21 at 21:02
Why the fact that arrays are 2D is relevant here? The question is simpler. «If one wants to `memcpy` from an object denoted by `s` to an object denoted by `d`, they write `memcpy(&d, &s, sizeof s)`. What gives one a permission, in case the objects are of array type, to pass pointers to the first elements of the arrays, i.e. `memcpy(d, s, sizeof s)`» (Or `memcpy(d, &s, sizeof s)`, or `memcpy(&d, s, sizeof s)`) — Language Lawyer, Sep 26 '21 at 03:32
Since I'm quoted, I'd just like to make my logic clear, as I feel I might not have been fully clear about what exactly I was saying in my original comments (it was 1am at the time :p). As an example, if you're copying one `int foo[3]` to another `int bar[3]` with `memcpy(bar, foo, sizeof bar);` I *think* everyone agrees this is well-defined. Just like that, `arr` and `arr_copy` are both simply arrays of `int [3]`s, and I think they should not behave differently than any other array type. HolyBlackCat definitely explains it better. — mediocrevegetable1, Sep 26 '21 at 18:36
@LanguageLawyer *they write memcpy(&d, &s, sizeof s).* Now witness this. `int n = readIntFromSomewhere(); char* a = malloc(n); char* b = malloc(n);` It looks like your position might be that it is not possible to memcpy the entire object pointed to by a a to the object pointed to by b without invoking UB. — n. m. could be an AI, Sep 26 '21 at 22:55
@n.1.8e9-where's-my-sharem. At least, someone wrote about this (actually, I was waiting for a question about `memcpy((char*)&some_obj, ...`). This is the only case I think where the standard is defective: `memcpy` (and similar functions) should take into account cases when they receive a pointer to «bytes» of an object. With the restriction on `n` compatible with the last sentence of http://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7 — Language Lawyer, Sep 26 '21 at 23:07
@n.1.8e9-where's-my-sharem.: Neither the C nor C++ Standard is free of circumstances where part of the Standard, together with an implementation's documentation, would define a behavior of some actions, but some other part of the Standard would characterize the behavior of an overlapping set of actions as invoking Undefined Behavior. There would have been no reason for people writing specs in the 1980s or 1990s to expend ink mandating that implementations give priority to the "defined" behavior in cases where doing otherwise would have obviously been obtuse. — supercat, Sep 27 '21 at 02:24
@supercat TL;DR the standard is full of ct@p, pick what you like and ignore the rest. — n. m. could be an AI, Sep 27 '21 at 04:44
@n.1.8e9-where's-my-sharem.: It's not the Standard that's the problem--it's compiler writers who adopt a "code that does anything that isn't absolutely positively unambiguously defined by the Standard is broken" attitude that are the problem. Unfortunately, clang and gcc are shielded from market pressures by the fact that programmers who wish to release open source that others can build are limited to using compilers that are freely distributable, killing the market for quality commercial compilers. — supercat, Sep 27 '21 at 15:02
`memcpy()` is only safe in C++ with POD types (which `int32_t` is, of course.) Because more complex types cannot be copied this way, you must use an alternate approach such as `std::copy()` to do the work. If one wants to code consistently and cleanly, always using `std::copy()` would help prevent accidental misuse of `memcpy()` — Nadeem Taj, Sep 28 '21 at 08:09
@NadeemTaj Please note that this question *is* about a POD type passed to `std::memcpy`. — Bob__, Sep 28 '21 at 08:25

HolyBlackCat · Answer 1 · 2021-09-27T19:34:11.500

31

std::memcpy(arr_copy, arr, sizeof arr); (your example) is well-defined.

std::memcpy(arr_copy, arr[0], sizeof arr);, on the other hand, causes undefined behavior (at least in C++; not entirely sure about C).

Multidimensional arrays are 1D arrays of arrays. As far as I know, they don't get much (if any) special treatment compared to true 1D arrays (i.e. arrays with elements of non-array type).

Consider an example with a 1D array:

int a[3] = {1,2,3}, b[3];
std::memcpy(b, a, sizeof(int) * 3);

This is obviously legal.¹

Notice that memcpy receives a pointer to the first element of the array, and can access other elements.

The element type doesn't affect the validity of this example. If you use a 2D array, the element type becomes int[N] rather than int, but the validity is not affected.

Now, consider a different example:

int a[2][2] = {{1,2},{3,4}}, b[4];
std::memcpy(b, a[0], sizeof(int) * 4);
//             ^~~~

This one causes UB², because since memcpy is given a pointer to the first element of a[0], it can only access the elements of a[0] (a[0][i]), and not a[j][i].

But, if you want my opinion, this is a "tame" kind of UB, likely to not cause problems in practice (but, as always, UB should be avoided if possible).

¹ The C++ standard doesn't explain memcpy, and instead refers to the C standard. The C standard uses somewhat sloppy wording:

C11 (N1570) [7.24.2.1]/2

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

A pointer to the first (or any) element of an array points only to that element, not to the entire array, even though the entire array is reachable through said pointer. Thus, if interpreted literally, it appears that @LanguageLawyer is right: if you give memcpy a pointer to an array element, you're only allowed to copy that single element, and not the successive elements.

This interpretation contradicts the common sense, and most probably wasn't intended.

E.g. consider the example in [basic.types.general]/2, which applies memcpy to an array using a pointer to the first element: (even though examples are non-normative)

constexpr std::size_t N = sizeof(T);
char buf[N];
T obj;
std::memcpy(buf, &obj, N);
std::memcpy(&obj, buf, N);

² This is moot, because of the problematic wording for memcpy described above.

I'm not entirely sure about C, but for C++, there are strong hints that this is UB.

Firstly, consider a similar example that uses std::copy_n, attempting to perform an element-wise copy rather than a byte-wise one:

#include <algorithm>

consteval void foo()
{
    int a[2][2] = {{1,2},{3,4}}, b[2][2] = {{1,2},{3,4}};
    std::copy_n(a[0], 4, b[0]);
}

int main() {foo();}

Running functions at compile-time catches most form of UB (it makes the code ill-formed), and indeed compiling this snippet gives:

error: call to consteval function 'foo' is not a constant expression
note: cannot refer to element 4 of array of 2 elements in a constant expression

The situation with memcpy is less certain, because it performs a byte-wise copy. This whole topic seems appears to be vague and underspecified.

Consider the wording for std::launder:

[ptr.launder]/4

A byte of storage b is reachable through a pointer value that points to an object Y if there is an object Z, pointer-interconvertible with Y, such that b is within the storage occupied by Z, or the immediately-enclosing array object if Z is an array element.

In other words, given a pointer to an array element, all elements of the said array are reachable through that pointer (non-recursively, i.e. through &a[0][0] only a[0][i] are reachable).

Formally, this definition is only used to describe std::launder (the fact that it can't expand the reachable region of the pointer given to it). But the implication seems to be that this definition summarizes reachability rules described by other parts of the standard ([static.cast]/13, notice that reinterpret_cast is defined through the same wording; also [basic.compound]/4).

It's not entirely clear if said rules apply to memcpy, but they should. Because otherwise, the programmer would be able to disregard reachability using library functions, which would make the concept of reachability mostly useless.

edited Sep 27 '21 at 19:34

answered Sep 25 '21 at 20:56

HolyBlackCat

78,603
9
131
207

3

_`std::memcpy(arr_copy, arr, sizeof arr);` (your example) is well-defined._ Except that it is not. – Language Lawyer Sep 25 '21 at 20:59
@AdrianMole I've linked the relevant quote from the C standard in the comment under the question, try to search for dups. There should be. – Language Lawyer Sep 25 '21 at 21:09
1

I'm with HolyBlackCat (for now, at least) - the arguments are pointers to the first elements of arrays of arrays, not to the first elements of any subarrays. – Adrian Mole Sep 25 '21 at 21:12
1

@LanguageLawyer Most of the dupes are about a popular case of passing `&arr[0][0]` to functions, which is indeed UB. Can you elaborate how your quote makes it UB? All I see is *"If an array is accessed beyond the end of an object, the behavior is undefined."*. For 1D arrays, given a pointer to an element, the "object" is clearly the whole array, and since multidimensional arrays are not mentioned, we have to assume that for them we also get the whole array, right? – HolyBlackCat Sep 25 '21 at 21:14
1

_For 1D arrays, the "object" is clearly the whole array_ **The object** is clearly **the object** pointed to by a `memcpy` argument (Like in [The `memcpy` function copies n characters from **the object** pointed to by `s2` into **the object** pointed to by `s1`](http://port70.net/~nsz/c/c11/n1570.html#7.24.2.1p2)). If you pass the pointer to the 1D array — it is the object. If to its first element — the object is its first element. – Language Lawyer Sep 25 '21 at 21:16
5

@LanguageLawyer `strcpy` is bound by the [same limitations](http://port70.net/~nsz/c/c11/n1570.html#7.24.1p1), and it always receives `const char *` as input parameter. By your logic, giving it a non-empty string would always cause UB, correct? – HolyBlackCat Sep 25 '21 at 21:18
_By your logic, giving it a non-empty string would always cause UB, correct?_ I don't see this. Could you elaborate? – Language Lawyer Sep 25 '21 at 21:26
@LanguageLawyer Firstly, do you agree that arrays of arrays (aka multidimensional) have no special treatment compared to arrays of non-arrays? – HolyBlackCat Sep 25 '21 at 21:33
2

If yes, then: You say *"If you pass the pointer to the 1D array — it is the object. If to its first element — the object is its first element."*. But if you do `char a[4], b[] = "xyz"; strcpy(a, b);`, the second arg is a pointer to a single character (pointer to whole array would be `&b`), but obviously the function can access the whole array? – HolyBlackCat Sep 25 '21 at 21:35
@HolyBlackCat, I guess "beyond the end of an object" is referring to the memory region that the pointer originated from, it is whole `arr`. Otherwise copying anything more than one element from the array would be UB if this text is taken literally. – tstanisl Sep 25 '21 at 21:36
2

`strcpy` has different requirements than `memcpy`. Yes, `char a[4], b[] = "xyz"; strcpy(a, b);` is ok. It doesn't mean that `char a[4], b[] = "xyz"; memcpy(a, b, 4);` is ok. – Language Lawyer Sep 25 '21 at 21:37
_It receives pointers to first elements of arrays, but it can access the whole arrays._ «Can» in which sense? «Works for me»? – Language Lawyer Sep 25 '21 at 21:40
5

@LanguageLawyer We both know that your second example is ok by common sense. Even if it turned out to be technically UB, this would be an obvious defect in the standard. Trying to avoid this supposed "UB" makes no sense. – HolyBlackCat Sep 25 '21 at 21:40
_We both know that your second example is ok by common sense_ **I** don't think so. If you apply `sizeof` to some object in `memcpy` third arg, you shall pass a pointer to this object in `memcpy`'s first or second arg. Or you don't agree that arrays have no special treatment (by `memcpy`) compared to non-arrays? – Language Lawyer Sep 25 '21 at 21:44
5

Sorry but arguments to `memcpy` point to voids and not to the whatever arrays you try to pass... – numzero Sep 25 '21 at 23:37
4

I'm not convinced that the second expression is undefined behaviour as claimed. `std::memcpy(arr_copy, arr[0], sizeof arr)` doesn't access anything outside the bounds of the object `arr`, so it's safe. `(void*)arr[0] == (void*)arr`. – Toby Speight Sep 26 '21 at 16:04
1

@TobySpeight Even though they're equal, I'm unsure if you're formally allowed to `reinterpret_cast` between them (or, I should say, the cast is ok, but dereferencing the result may or may not be UB). – HolyBlackCat Sep 26 '21 at 18:17
1

@TobySpeight Consider [a similar example](https://gcc.godbolt.org/z/xWo4M5vWf). Running functions at compile-time checks for most UB, and the example doesn't compile because the UB is detected. This is slightly different than the `memcpy` example (since it takes `void *`), but I argue that the same principles apply. Also see the definition of "reachibility through a pointer" for [`std::launder`](https://en.cppreference.com/w/cpp/utility/launder#Notes). – HolyBlackCat Sep 26 '21 at 18:26
Note that your godbolt example is not UB; on the contrary, it is entirely defined, namely forbidden with constexpr. Outside constexpr you can of course always copy existing objects through char pointers (which is what memcpy does semantically); it does not matter in the least how you know that there are objects. See [my answer](https://stackoverflow.com/a/69339536/3150802) even though it doesn't say more, really, than that. – Peter - Reinstate Monica Sep 26 '21 at 22:54
3

Also, since `&arr`, `arr`, `arr[0]` and `&arr[0][0]` are guaranteed to be converted to the same void pointer logically either one of them can be used interchangeably with memcpy. The function cannot know, should not know and doesn't want to know more than that n bytes exist starting at that address. (I realize this is what Toby said in other words.) – Peter - Reinstate Monica Sep 26 '21 at 22:57
1

*Even if it turned out to be technically UB, this would be an obvious defect in the standard.* The authors of C89 and every standard derived from it have assumed that in cases where an action was simultaneously defined and "undefined" by the Standard, compiler writers would seek to give priority to the former except in cases where their customers would benefit from them doing otherwise. The fact that they didn't expend ink trying to enumerate all of the cases where compilers must give priority to the former was an editorial choice, not a defect. – supercat Sep 27 '21 at 03:01
1

@HolyBlackCat: The only "defect" involved is a failure to have made clear that the choice of whether to give priority to the definition or "undefinition" is viewed as a Quality of Implementation issue outside the Standard's jurisdiction. Any non-garbage implementation should likely support the `memcpy` paradigm whether or not the Standard would require non-garbage implementations to do likewise. – supercat Sep 27 '21 at 03:03
@Peter - thanks for rephrasing my clumsy attempt much more clearly! :-) – Toby Speight Sep 27 '21 at 06:49
@Peter-ReinstateMonica `memcpy` itself might not care, but the outside code can (in theory, at least), be optimized under assumption that `memcpy` didn't touch anything other than what it was given. I don't have rock-solid evidence, but I believe it implicitly follows from the reachibility rules of `std::launder`. – HolyBlackCat Sep 27 '21 at 07:15
@HolyBlackCat The implementation is emphatically not allowed to ignore the fact that I pass `sizeof(int) * 4` to memcpy! It knows what memcpy is given and should respect it. ;-) Humor aside, I see what you say; but I think it is always possible to jump around in aggregates via offsets, starting with a given element address. The implementation cannot assume pedestrian code. – Peter - Reinstate Monica Sep 27 '21 at 07:51
3

@LanguageLawyer Looked through the spec again, and it seems you're [technically correct](https://www.youtube.com/watch?v=hou0lU8WMgo), [again](https://stackoverflow.com/q/62329008/2752075). I still think that we should disregard this "UB" in real life, it just shows how broken the standard wording on the topic is. – HolyBlackCat Sep 27 '21 at 19:37
_you're technically correct, again_ As always. _consider the example in [basic.types.general]/2, which applies `memcpy` to an array using a pointer to the first element_ I can extend [my comment](https://stackoverflow.com/questions/69329884/is-copying-2d-arrays-with-memcpy-technically-undefined-behaviour/69329970?noredirect=1#comment122555851_69329884) by saying that if a `memcpy` arg points to an element of an array of char type, then the description of the behavior should be different. Since there is no need to «treat the object as array of char type» anymore. – Language Lawyer Sep 27 '21 at 19:54
1

_The C standard uses somewhat sloppy wording_ I'd say it is pretty exact. – Language Lawyer Sep 27 '21 at 20:37
Fun fact: GCC and clang still treat `memcpy(dst[0], src, size)` as copying into a 16-byte `int[2][2]` destination. https://godbolt.org/z/8537benjj. (See update to my answer). So they effectively define the behaviour even of that, whether or not ISO C++ leaves it undefined. – Peter Cordes Oct 01 '21 at 16:08
@PeterCordes Mhm, this is mostly a "formal" UB it seems, at least at this moment. – HolyBlackCat Oct 01 '21 at 18:35

Peter Cordes · Accepted Answer · 2021-10-01T16:05:53.940

It's well-defined, even if you use memcpy(arr_cpy, arr, size) rather than
memcpy(&arr_cpy, &arr, size) (which @LanguageLawyer has finally explained is what they've been arguing for the whole time), for reasons explained by @HolyBlackCat and others.

The intended meaning of the standard is clear, and any language to the contrary is a defect in the standard, not something compiler devs are going to use to pull the rug out from under countless normal uses of memcpy (including 1D arrays) that don't cast int* to int (*)[N], especially since ISO C++ doesn't allow variable-length arrays.

Experimental evidence for how compiler-developers chose to interpret the standard as letting memcpy read from the whole outer object (array-of-array-of-int) which is pointed-to by the void* arg, even if that void* was obtained as a pointer to the first element (i.e. to the first array-of-int):

If you pass a size that's too large, you do get a warning, and for GCC the warning even spells out exactly what object and what size it sees being memcpyed:

#include <cstring>

int dst[2][2];
void foo(){
    int arr[2][2] = {{1,1},{1,1}};
    std::memcpy(dst, arr, sizeof(arr));  // compiles cleanly
}

void size_too_large(){
    int arr[2][2] = {{1,1},{1,1}};
    std::memcpy(dst, arr, sizeof(arr)+4);
}

Using &dst, &src makes no difference here to warnings or lack thereof.
Godbolt compiler explorer for GCC and clang -O2 -Wall -Wextra -pedantic -fsanitize=undefined, and MSVC -Wall.

GCC's warning for size_too_large() is:

warning: 'void* memcpy(void*, const void*, size_t)' forming offset [16, 19] is  \
  out of the bounds [0, 16] of object 'dst' with type 'int [2][2]' [-Warray-bounds]
   11 |     std::memcpy(dst, arr, sizeof(arr)+4);
      |     ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
<source>:3:5: note: 'dst' declared here
    3 | int dst[2][2];

clang's doesn't spell out the object type, but does still show sizes:

<source>:11:5: warning: 'memcpy' will always overflow; destination buffer has size 16, but size argument is 20 [-Wfortify-source]
    std::memcpy(dst, arr, sizeof(arr)+4);
    ^

So it's clearly safe in practice with real compilers, a fact which we already knew. Both see the destination arg as being the whole 16-byte int [2][2] object.

However, GCC and clang are possibly less strict than the ISO C++ standard. Even with dst[0] as the destination (decaying to an int* rather than int (*)[2]), they both still report the destination size as 16 bytes with type int [2][2].

HolyBlackCat's answer points out that calling memcpy this way really only gives it the 2-element sub-array, not the whole 2D array, but compilers don't try to stop you from or warn about using a pointer to the first element to access any part of a larger object.

As I said, testing real compilers can only show us that this is well-defined on them currently; arguments about what they might do in future requires other reasoning (based on nobody wanting to break normal uses of memcpy, and the standard's intended meaning.)

ISO standard's exact wording: arguably a defect

The only question is whether there's any merit to the argument that there's a defect in the standard's wording for the way it explains which object is relevant for the language beyond the end of an object, whether that's limited to the single pointed-to object after array to pointer "decay" for passing an arg to memcpy. (And yes, that would be a defect in the standard; it's widely assumed that you don't need and shouldn't use &arr with an array type for memcpy, or basically ever AFAIK.)

To me, that sounds like a misinterpretation of the standard, but I may be biased because I of course want to read it as saying what we all know is true in practice. I still think that having it be well-defined is a valid interpretation of the wording in the standard, but the other interpretation may also be valid. (i.e. it could be ambiguous whether it's UB or not, which would be a defect.)

A void* pointing to the first element of an array can be cast back to an int (*)[2] to access the whole array object. That isn't how memcpy uses it, but it shows that the pointer hasn't lost its status as a pointer to the whole N-dimensional array. I think the authors of the standard are assuming this reasoning, that this void* can be considered a pointer to the whole object, not just the first element.

However, it's true that there's special language for how memcpy works, and a formal reading could argue that this doesn't let you rely on normal C assumptions about how memory works.

But the UB interpretation allowed by the standard is not how anyone wants it to work or thinks it should. And it would apply to 1D arrays, so this interpretation conflicts with standard examples of using memcpy that are well-known / universally assumed to work. So any argument that the wording in the standard doesn't quite match this is an argument that there's a defect in the wording, not that we need to change our code and avoid this.

There's also no motivation for compiler devs to try to declare this UB because there's very little optimization to be had here (unlike with signed overflow, type-based aliasing, or assumption of no NULL deref).

A compiler assuming that runtime-variable size must only affect at most the whole first element for the pointer type that got cast to void* wouldn't allow much optimization in real code. It's rare for later code to only access elements strictly after the first, which would let the compiler do constant-propagation or similar things past a memcpy that was intended to write it.

(As I said, everyone knows this isn't what the standard intended, unlike with clear statements about signed overflow being UB.)

The C++ standard is essentially silent about memcpy and refers to the C standard; the C standard appears to assume that a bytewise copy doesn't need much of an explanation, a notion I agree with, and is otherwise unconcerned about the ways we arrived at the void pointers, as it should: They are void pointers. That a compiler warns about out-of-bounds access for known objects is almost irrelevant here: Pass the addresses through a function in a different TU and the information about the source object is not accessible any longer anyway. The only question is whether the memory is accessible. — Peter - Reinstate Monica, Sep 26 '21 at 23:07
@Peter-ReinstateMonica: Some who are arguing that this memcpy could be UB are basing the argument on a reading of the standard which only defines the behaviour for memcpy to access *the object* it gets passed a pointer to. So the question becomes, which object is that; the whole 2D array, or the first 1D array element of it, when you have a true array, not just a pointer to one. (It seems from \@LanguageLawyer's arguments that calling a `foo(int (*)[2])` function which in turn calls memcpy should be fine since you already have a pointer, or maybe they think one should use `&arr` there, too?) — Peter Cordes, Sep 27 '21 at 00:00
@Peter-ReinstateMonica: It's a well-known fact that passing args through non-inline functions will hide UB from the compiler, depriving it of the opportunity to capriciously break code that assumes any asm-level thinking like bytes are just bytes. Anyway, that's irrelevant; the key is the warning I got containing *positive* evidence that it does agree we passed a 16-byte object. — Peter Cordes, Sep 27 '21 at 00:05
@PeterCordes: What about the possibility that this is one of many actions which the authors of the Standard expected compiler writers to support with or without a mandate, and which consequently isn't actually defined by the Standard but should be processed meaningfully *anyhow*? — supercat, Sep 27 '21 at 02:46
@supercat: Maybe. I think it helps that a plain C function could safely access those bytes by casting the `void*` back to a pointer-to-array-of-int, so yeah it would make sense for the standard to assume people were thinking about it that way, not looking for ways to decide that an obviously-sensible thing was UB. Any more pessimistic reading just seems like making life more difficult for oneself for no reason, unlike cases where the standard clearly does say something is UB. — Peter Cordes, Sep 27 '21 at 03:11
@PeterCordes: What is tragic is that people who pushed the idea that UB gave unlimited license for compiler nonsense weren't promptly responded to with "Yes, a conforming but garbage compiler could do that, but only garbage-quality compilers would use the Standard as an excuse to create needless obstacles to the things programmers are trying to do." — supercat, Sep 27 '21 at 14:55
@supercat: I agree with that sentiment to some degree, but I also like the optimizations it enables, e.g. for widening signed-int loop counters used as array indices. Perhaps the right solution was a new language like Rust, but it wasn't designed until after these two conflicting factors really came to light with C, benefiting from seeing how that played out and the problems aggressive UB-won't-happen optimization created in C. In Rust, we can have wrapping operations if we want on any integer type, for example, and assume-won't-wrap operations even on unsigned. IDK about e.g NULL-checks. — Peter Cordes, Sep 27 '21 at 15:43
@supercat: But unfortunately in some ways, C already got to a point where it's not safe to use asm-based reasoning for how computers and memory work to figure out what you can do in C. IDK if many compiler devs who originally thought these optimizations were a good idea has regretted it, or at least sees bigger usability downsides than they realized before. (I think even if so, we probably won't see many major compilers rolling back such optimizations, because compiler versions that break such code are out there.) — Peter Cordes, Sep 27 '21 at 15:49
@supercat: If that's the case, perhaps the need for a better language (instead of making C harder to use) to enable such optimizations while still providing ways to write safe code wasn't visible or obvious to compiler devs until too late? I agree the current situation with C is not great, and makes it a hard language to truly learn. — Peter Cordes, Sep 27 '21 at 15:52
@PeterCordes: IMHO, the proper remedy would be to explicitly recognize many of the "popular extensions" alluded to in the C89/C99 Rationale documents, and allow a program to either compile successfully or refuse compilation based upon whether an implementation supports them. At present, the Standard's definitions of "Conforming C Program" and even "Conforming C Implementation" are so loose as to be basically meaningless, but if it were to recognize categories of implementation and program such that an implementation would either have to process a program meaningfully *or refuse to do so*... — supercat, Sep 27 '21 at 16:05
@PeterCordes: ...then the Standard could exercise meaningful normative authority over programs and implementations. A conforming implementation could either process signed integer multiplication in a way which never has side effects, or refuse to run programs that demand such treatment. An implementation that accepts a program that demands such treatment but behaves in wonky fashion in case of overflow, however, would be non-conforming, as would a program that relies upon side-effect-free treatment without demanding it. — supercat, Sep 27 '21 at 16:07
@PeterCordes: What's needed isn't a "new" language, but rather a recognition of the language that implementations process with optimizations disabled, and a recognition that "optimizations" that make tasks more difficult than they would be in the former language aren't really optimizations. — supercat, Sep 27 '21 at 16:24

Peter - Reinstate Monica · Answer 3 · 2021-09-27T08:49:20.087

15

With all due respect, HolyBlackCat is utterly wrong, for very first principles. My C17 standard draft says in 7.24.1: "For all functions in this subclause [containing memcpy], each character shall be interpreted as if it had the type unsigned char." The C standard doesn't really make any type considerations for these trivial functions: memcpy copies memory. As far as semantics are at all considered, it is treated as a sequence of unsigned characters. Therefore, the following first C principle applies:

As long as there is an initialized object at an address you can access it through a char pointer.

Let's repeat it for emphasis and clarity:

Any initialized object can be accessed by a char pointer.

If you know that an object is at a specific address 0x42, for example because the hardware of your computer maps the x coordinate of your mouse there, you can convert that into a char pointer and read it. If the coordinate is a 16 bit value you can read the next byte too.

Nobody cares how you know that there is an integer: If there is one, you can read it. (Peter Cordes noted that there is no guarantee that you can arrive at a valid address (or at least, at the expected address) through pointer arithmetic from an unrelated object because of possible segmented memory architectures. But this is not the example case: The entire array is one object and must reside in a single segment.)

Now that we have 3 arrays of 3 ints we know that 9 ints are placed consecutively in memory; that is a language requirement. The entire memory there is full of ints belonging to a single object, and we can iterate manually over it through char pointers, or we can turf it to memcpy. Whether we use arr or arr[0] ~~or obtain the address through a stack offset from some other variable~~ [<- not guaranteed correct as Peter Cordes reminded me] or some other magic or simply make an educated guess is entirely irrelevant as long as the address is correct, and of that there is no doubt here.

edited Sep 27 '21 at 08:49

answered Sep 26 '21 at 22:46

Peter - Reinstate Monica

15,048
4
37
62

2

Interesting - and a well-made case. But that's your C17 Draft Standard. What does the C++17 Standard have to say? – Adrian Mole Sep 26 '21 at 23:35
2

Remember that C (and C++) don't assume a flat memory model. Pointers don't have to be simple integers like 0x42. A hypothetical C++ on a segmented memory model (which *doesn't* extend the language with `far` pointers) might have a max object size of 64k, and arrange for single objects not to cross segment boundaries when accessed with a `seg:off` pointer derived in a valid way. (Because there can be multiple representations for a pointer to the same byte, but some of them would have an `off` component too close to wrapping around to iterate over the rest of the object). – Peter Cordes Sep 27 '21 at 01:00
See my answer on [Does C have an equivalent of std::less from C++?](https://stackoverflow.com/a/58322233) for some discussion of that kind of thing. But the key point is that you only get into trouble when you try to get a pointer into one object by adding or subtracting relative to a pointer to a different object. So it's still valid to `memcpy` because any pointers it creates will be derived from a pointer to the start of the entire 2D array object, and it must be smaller than the implementation's max object size (else you already have UB). – Peter Cordes Sep 27 '21 at 01:04
So TL:DR: I agree with the idea of the reasoning in this answer, but it goes a bit too far in its argument by assuming a flat memory model. Of course everything is easy in that case. As it is for accesses within a single object, using pointers derived from the start of it. However, just because something would make sense in asm does *not* mean it's legal in C. e.g. signed integer wraparound is UB in C, despite (almost?) every machine having add instructions that wrap, e.g. MIPS `addu`. IDK about old 1's complement machines, if any of them didn't have easy wrapping. – Peter Cordes Sep 27 '21 at 01:07
2

The Standards use the term "Undefined Behavior", among other things, as a catch-all to describe actions which might in some circumstances, on some implementations, behave in a manner that could not be practically described in a manner consistent with sequential program execution. If a wide range of actions would violate a constraint, but there was no conceivable reason for a compiler writer not to process some common corner cases usefully, Standards would generally rely upon compiler writers to exercise some common sense. The idea that UB was an unconditional invitation to nonsense... – supercat Sep 27 '21 at 02:38
2

...was a much more recent innovation, which might be okay if the Standards made any real effort to address places where an action is both defined and "undefined" by different parts of the Standard. – supercat Sep 27 '21 at 02:39
1

@AdrianMole I don't think anybody on the committee has thought about memcpy past 1978. OK, past 1985 when the first C89 draft was published. – Peter - Reinstate Monica Sep 27 '21 at 07:21
@PeterCordes Unfortunately you are right with address aliases and wraparounds in segmented architectures. (Are there still segmented memory architectures or can we all agree on a linear model, much like we now agree on 2's complement?) Crossing boundaries of top-level objects is UB. But I don't suggest doing that in my answer, and it surely doesn't apply to the OP example. – Peter - Reinstate Monica Sep 27 '21 at 07:36
I know you didn't, but the argument in this answer about how addresses work would make that ok, too. I think there is a valid argument to be made along these lines for memcpy, if you narrow it a bit to pointers derived in a valid way. And I see you've already done that; that's exactly what I was hoping for, thanks. – Peter Cordes Sep 27 '21 at 11:33
1

@Peter-ReinstateMonica: Hardware manufacturers may all agree on two's-complement, but compiler vendors aren't even willing to guarantee that overflow in a two's-complement multiplication won't disrupt the behavior of surrounding code even in cases where the result would end up being ignored, unless a flag is used to enforce precise two's-complement wraparound. – supercat Sep 27 '21 at 20:38

Ozob · Answer 4 · 2021-09-26T17:34:37.657

The question is about C++; I can only answer for C. In C, this is well-defined behavior. I'll be quoting from a December 11, 2020 draft of the C2x standard, found at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf; all emphasis will be as in the original.

The question is whether we can apply a memcpy to an int[3][3]. An int[3][3] is an array of arrays, while memcpy works on bytes. So we will need to know what the standard says about the representation of arrays as bytes.

We start with arrays. Section 6.2.5, "Types", paragraph 22, defines array types:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type.

An int[3][3] is therefore a contiguously allocated nonempty set of three int[3] objects. Each of those is a contiguously allocated nonempty set of three int objects.

Let's first ask about int objects. Everyone expects a memcpy of a single int to work. To see that the standard requires this, we look in section 6.2.6.1, "General", paragraph 2:

Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

So an int is a contiguous sequence of one or more bytes. Therefore our int[3][3] is a contiguous sequence of three contiguous sequences of three contiguous sequences of sizeof(int) bytes; the standard requires that it is 9 × sizeof(int) contiguous bytes.

The standard also puts requirements on how these bytes relate to the array indices. Section 6.5.2.1, "Array subscripting," paragraph 2, says:

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).

So arr[1] == *((arr)+(1)) is the second int[3], arr[1][2] == *((*((arr)+(1)))+(2)) is its third element, and this must be the sixth int past the start of arr. Paragraph 3 is explicit about this:

Successive subscript operators designate an element of a multidimensional array object. If E is an n-dimensional array (n ≥ 2) with dimensions i × j × ··· × k, then E (used as other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with dimensions j × ··· × k. If the unary * operator is applied to this pointer explicitly, or implicitly as a result of subscripting, the result is the referenced (n − 1)-dimensional array, which itself is converted into a pointer if used as other than an lvalue. It follows from this that arrays are stored in row-major order (last subscript varies fastest).

Despite this, you're still not allowed to access arr[0][4]. As Ted Lyngmo's answer notes, Appendix J.2 specifically says:

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

But since memcpy is really about bytes, it's okay. Its source and destination aren't multidimensional arrays but void *. 7.24.2.1, "The memcpy function," explains:

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

A "character" can have three meanings according to section 3.7. The relevant one seems to be "single-byte character" (3.7.1), and therefore memcpy copies n bytes. Hence memcpy(arr_copy, arr, sizeof(arr)) must copy arr to arr_copy correctly.

Though come to think of it, memcpy doesn't say that it copies n contiguous bytes. I suppose it could copy the same byte n times. Or pick n random bytes. That would make debugging ... interesting.

Andrew Henle · Answer 5 · 2021-09-26T15:19:05.527

3

Is copying 2D arrays with "memcpy" technically undefined behaviour?

(n.b., this only covers C, per the draft C11 standard at https://port70.net/~nsz/c/c11/n1570.html)

No, it is not.

TLDR Summary:

6.7.6.3 Function declarators (including prototypes), paragraph 7 defines decay of arrays to pointers in function calls. BUT that decay is done under the auspices of 6.9.1 Function definitions, paragraph 7, which states "... in either case, the type of each parameter is adjusted as described in 6.7.6.3 for a parameter type list; the resulting type shall be a complete object type."

That directly refutes the concept that the pointer that results from array decay when an array is passed as a function parameter does not refer to the entire array.

Detailed Answer

First arrays are "complete objects".

Why arrays must be "complete objects"

(If someone can find a statement in the standard[s] defining arrays as "complete objects" this entire section of this answer is redundant.)

While not explicitly defined as such in the (draft) C11 standard (at least not anywhere that I have been able to find), arrays are implicitly "complete objects" in multiple statements, such as statements where arrays are explicitly removed from the "complete object" category:

6.5.2.2 Function calls, paragraph 1:

The expression that denotes the called function shall have type pointer to function returning void or returning a complete object type other than an array type.

6.7.2.1 Structure and union specifiers does not explicitly allow array members of structures and unions other than "flexible array members" in paragraph 18:

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. ...

The only paragraph of 6.7.2.1 Structure and union specifiers is paragraph 9:

A member of a structure or union may have any complete object type other than a variably modified type.

That is the only statement in the (draft) C11 standard that allows for the inclusion of arrays in structures and unions.

Array initialization is covered by 6.7.9 Initialization, paragraph 3:

The type of the entity to be initialized shall be an array of unknown size or a complete object type that is not a variable length array type.

That only covers arrays of fixed, known size via the category "complete object".

Function return values have arrays explicitly removed from the "complete object" category by 6.9.1 Function definitions, paragraph 3:

The return type of a function shall be void or a complete object type other than array type.

So, we have established that arrays are "complete objects".

Parameters to functions are "complete object types"

Per 6.9.1 Function definitions, Semantics, paragraph 7:

the type of each parameter is adjusted as described in 6.7.6.3 for a parameter type list; the resulting type shall be a complete object type.

Why "complete object" is important

6.5.2.1 Array subscripting, paragraph 1 states:

One of the expressions shall have type ''pointer to complete object type'', the other expression shall have integer type, and the result has type ''type''.

And per 6.9.1p7, the array was passed as a "complete object type", which means the pointer can be dereferenced to access the entire array.

Q.E.D.

edited Sep 26 '21 at 15:19

answered Sep 26 '21 at 15:09

Andrew Henle

32,625
3
24
56

2

@AdrianMole Thanks. But as I just added, this only covers C per the C11 draft standard. – Andrew Henle Sep 26 '21 at 15:20
Drive-by DV, care to explain your reasoning? – Andrew Henle Sep 26 '21 at 18:30
1

NMDV, and I have yet to form an opinion on most of the content of this answer, but its focus on the term "complete object" seems to indicate a mis-parse. "Complete object type" should be read as "complete (object type)", and it describes a type that is both an object type (as opposed to a function type) and a complete type (as opposed, for example, to an array type with one dimension unspecified). – John Bollinger Sep 26 '21 at 18:32
1

Now having read this answer more completely, I'm not buying it at all. Although I agree that the behavior of copying an array via a function call of the form `memcpy(dest_array, src_array, sizeof(src_array));` is well defined for arrays with any element type, including other array types, I don't accept that this argument establishes that. – John Bollinger Sep 26 '21 at 18:43
@JohnBollinger I think given more time I could rework this to be more precise. Either way, I think the logic remains - the standard implicitly defines arrays of known, fixed size as both "complete (object type[s])" and "(complete object) types". The lack of an explicit definition of arrays as either is irksome, though. In this case, I'd think 6.9.1p7's "the resulting type shall be a complete object type" covers both readings. Then there's the "50-year-old standardized behavior is not UB" argument... – Andrew Henle Sep 26 '21 at 18:44
In this context, "the resulting type shall be a complete object type" means that the pointer type to which `T[n]` is converted, `T *`, shall be a complete (object) type. And this is in fact the case even if `T` is not a complete type (or not an object type) itself. – John Bollinger Sep 26 '21 at 18:48
@JohnBollinger I don't read it that way given 6.7.6.3p7's "A declaration of a parameter as ''array of type'' ...". Per that, IMO the parameter **is** the "array of type". It seems to me the quotes would have been deliberately inserted so it reads "... (array of type)" instead of "... array of type". I think your reading aligns more closely to the latter. – Andrew Henle Sep 26 '21 at 18:59
6.7.6.3/7 says that if I write `void foo(int x[42]);` then it is as if I had written `void foo(int *x);`, instead. The fact that I originally expressed it in terms of an array type is not otherwise relevant. It means exactly the same thing as if I had originally written it as `void foo(int *x);` in the first place. Yes, array types are complete object types, but no, what is passed to the function is not the array, but the pointer. Indeed, that's mostly a separate issue, as it arises from array decay, not from anything specific to function calls. – John Bollinger Sep 26 '21 at 19:13
@JohnBollinger OK, now I see what you mean. However, [footnote 10](https://port70.net/%7Ensz/c/c11/n1570.html#note10) states those notations are equivalent, so the problem becomes where in the standard is such an equivalency established formally. Which, unfortunately, I don't have the time to address right now. And as I've alluded to in some of my other comments on this question, given the history here I'd say the burden of proof that UB is formally invoked here is on those who claim UB is invoked: quote the relevant portions of the standard[s] that support such an interpretation. – Andrew Henle Sep 26 '21 at 19:29
(cont) And just quoting 6.7.6.3p7 without addressing the quoted "array of type", or reading 6.3.2.1p3 and stopping, does not even begin to establish that the resulting pointer does ***not*** refer to the entire array. IMO that ignores both the history of C (in the case of this answer) and, IMO, the intent of the standard[s]. – Andrew Henle Sep 26 '21 at 19:37
3

Again, I agree with you that the behavior in question is well defined. But no, I'm still not buying your argument for that. I think we agree that the question is whether the pointer must be interpreted as pointing to only one object for `memcpy()`'s purposes, and I think the fact that `memcpy()` receives the result of its conversion to `void *`, which carries no information about the pointed-to type, is probably a better handle on that. – John Bollinger Sep 26 '21 at 19:43
5

Complete object and complete types are unrelated except by unfortunate choice of names. The former refers to objects that aren't subobjects, and the latter refers to types that are completely defined (e.g. not only declared, not array of unknown bounds) or cv `void`. "(Complete object) types" is nonsensical as all objects can be subobjects regardless of their type. – Passer By Sep 27 '21 at 07:36
@PasserBy Well, given you appear to be invoking the C++ standard with "cv `void`", you're probably right. I limited myself to C in this answer because I don't have anywhere near the familiarity with C++ that I have with C. And the C standard seems to be woefully lacking in how it defines "complete ..." anything. Given the nature of C++, I suspect it has to define things much more stringently. As I noted in my answer, I couldn't even find where C explicitly defines arrays as complete objects. – Andrew Henle Sep 27 '21 at 12:28
@PasserBy (cont) The closest implicit definition in C11 appears to me to be 6.7.6.2p4: "If the size is not present, the array type is an incomplete type.", implying that if "the size" is present, the array is a complete type. – Andrew Henle Sep 27 '21 at 12:30

score 2 · Answer 6 · edited Sep 29 '21 at 16:21

The indicated use of memcpy will be processed meaningfully by any compiler whose authors don't abuse the Standard as an excuse to regard useful constructs as "broken". The only people who should care about whether the Standard actually defines it without contradiction would be compiler writers who abuse the Standard, or those seeking to protect themselves against compiler writers that abuse the Standard. If the C or C++ Standards were intended to be immune to abuse, it might be worth worrying about whether it 100% unambiguously specifies all of the cases were memcpy should work. Both are written, however, to be reliant upon compiler writers to recognize that if a Standard would simultaneously specifies how some constructs work, but characterizes an overlapping set of constructs as invoking Undefined Behavior, compilers should make a good faith effort to process code as usefully as practical.

Consider the two functions:

char arr[4][4][4];

int test1(int i, unsigned mode)
{
  arr[1][0][0] = 1;
  memcpy(arr[0][i], arr[2][0], mode & 4);
  return arr[1][0][0];
}

int test2(int i, unsigned mode)
{
  arr[1][0][0] = 1;
  memcpy(arr[0]+i, arr[2], mode & 4);
  return arr[1][0][0];
}

Depending upon what the programmer is trying to do, any of the following interpretations might be most useful:

Process both functions in a way that reloads the value of arr[1][0][0] after the memcpy.
Process both functions in a way that returns 1 unconditionally without regard for whether memcpy overwrote it.
Process the first function in a manner that unconditionally returns 1, but process the second in a manner that reloads arr[1][0][0], on the basis that while the Standards define the use of index operators on array lvalues/glvalues in terms of array decay followed by pointer indexing, programmers' choice of syntax is often based upon whether an array-type lvalue/glvalue is actually being used as an array, or is being used as a means of getting a pointer to the first element, which then be used as the base for further address calculations.

If a compiler were to attempt to process the code meaningfully in the case where i and mode are 4, there would be no bona fide ambiguity about how the code should behave. Only one behavior would make sense. The only ambiguity is whether the benefits of accommodating that case would be worth the execution cost of doing so; accommodating the behavior is always a "safe" choice. It would be awkward to write the Standard to say that test1 should have defined behavior for i==0..3 when n is 4, and i==0..4 when n is zero, but test2 should have defined behavior for i==0..15 regardless of n, but for most purposes the best blend of semantics, compatibility, and optimization would be achieved by processing code in that fashion.

Ted Lyngmo · Answer 7 · 2021-09-27T12:08:14.317

My current standpoint is that when an int[3][3] is passed as an argument to a function, it decays into a pointer to the first element in that array. The first element is an int[3] and the other two int[3]s are in range - just like when you pass a 1D int[3] to a function, you get a pointer to the first int and the other two ints are in range, hence the memcpy is safe.

Original answer:

_{This answer is based on some wrong assumptions I made by reading something a long time ago. I'll leave the answer and the comments up to perhaps prevent other people from walking into the same mind-trap.}

What is passed to the function decays into a pointers to the first elements, that is in this case, two int(*)[3]s.

C draft Annex J (informative) Portability issues J.2 Undefined behavior:

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

memcpy(arr_copy, arr, sizeof arr); get's two int(*)[3] and will access both out of range, hence, UB.

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/237569/discussion-on-answer-by-ted-lyngmo-is-copying-2d-arrays-with-memcpy-technicall). — user229044, Sep 27 '21 at 18:47
The comments that have been moved to chat contains my conversion from thinking it's UB to, "no it's not" and the arguments (made by various people) that made me rethink my original thoughts on this matter. — Ted Lyngmo, Sep 28 '21 at 09:24

score -7 · Answer 8 · answered Sep 26 '21 at 12:05

C++ standard says ([cstring.syn]/1):

The contents and meaning of the header <cstring> are the same as the C standard library header <string.h>.

C11 7.24.2.1 The memcpy function says:

Synopsis

1
         #include <string.h>
         void *memcpy(void * restrict s1,
              const void * restrict s2,
              size_t n);
Description

2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1…

Given this description, one may wonder what if n is greater than the size of the object pointed to by s1/s2. «Common sense» suggests that copying more than, say, sizeof(int) bytes from an int object should be meaningless.

And indeed, there is 7.24.1 String function conventions p.1 saying:

The header <string.h> declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type. … Various methods are used for determining the lengths of the arrays, but in all cases a char * or void * argument points to the initial (lowest addressed) character of the array. If an array is accessed beyond the end of an object, the behavior is undefined.

Thus, when passing a pointer to the first element of an array, it is «the object» from memcpy p.2 and trying to copy more bytes than this object has is UB.

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/237570/discussion-on-answer-by-language-lawyer-is-copying-2d-arrays-with-memcpy-techn). — user229044, Sep 27 '21 at 18:47

Is copying 2D arrays with "memcpy" technically undefined behaviour?

8 Answers8

ISO standard's exact wording: arguably a defect

Any initialized object can be accessed by a char pointer.

Synopsis

Description

Linked