2

According to the C standard (SO link 1 and link 2) we cannot access an element of a row using out-of-bounds index:

int x[10][10] = ...; // initialize x
int q = x[0][10]; // This is an out-of-bounds access

Then is it valid to initialize the array using the following loop?

int *p = &x[0][0];
for (int i = 0; i < 100; ++i)
  p[i] = 0;

If this is not valid, then is it valid to initialize x using memset(&x[0][0], 0, sizeof(x))?

int *p = &x[0][0];
memset(p, 0, sizeof(x))?

edit: I wonder whether the answers are different in C++ as well..! :)

aqjune
  • 478
  • 1
  • 3
  • 17
  • 14
    Do you want to know for C or C++? They are different languages. – NathanOliver Aug 11 '21 at 18:37
  • This `memset` could be invalid, but `memset(&x, 0, sizeof x)` is more likely to be valid. – HolyBlackCat Aug 11 '21 at 18:37
  • How are the first two snippets related? They to completely different things. – Gerhardh Aug 11 '21 at 18:37
  • 1
    @HolyBlackCat `x` is an array which already decays to a pointer. Why would you want `&x`? – Gerhardh Aug 11 '21 at 18:38
  • 3
    Your `memset` is perfectly valid. But you must be aware that it only works for all bytes having same value. For `0` that is obviously fine, but filling the array with `0x1234` is not possible using `memset`. – Gerhardh Aug 11 '21 at 18:40
  • 7
    C++ Side note: `int x[10][10] = {};` – user4581301 Aug 11 '21 at 18:41
  • @Gerhardh Does it imply that memset cannot be implemented like the for loop? – aqjune Aug 11 '21 at 18:41
  • @aqjune, memset() operates on individual bytes. So for multy-byte types like `int`, it can only ever set all bytes of each `int` to the same value. In other words, there are only 256 possible values for an `int` that can be assigned using memset: 0x00000000, 0x01010101, etc... –  Aug 11 '21 at 18:43
  • 1
    Found one more related question for you [here](https://stackoverflow.com/questions/7269099/may-i-treat-a-2d-array-as-a-contiguous-1d-array), unfortunately with just as vague answers as in the other links. – rustyx Aug 11 '21 at 18:51
  • @Gerhardh Had a brain fart. Yes, `x` should be equally valid, unlike `x[0]`. – HolyBlackCat Aug 11 '21 at 18:51
  • As I wrote: `memset` can be used if you want all bytes same. `int` value `0` is all bytes `0` which means you can use `memset`. `int` value `1` is (for 32 bit `int`) either `00 00 00 01` or `01 00 00 00` which means you cannot use `memset` as it does not fill patterns with different bytes. – Gerhardh Aug 11 '21 at 19:13
  • aqjune, _initialize_ has a special meaning in C. `int x[10][10] = {{0}};` is an example of _initialization_. `memset(p, 0, sizeof(x))` is _assignment_. So cannot use `memset()` to _initialize_. – chux - Reinstate Monica Aug 11 '21 at 19:17
  • Don't dual tag, C and C++ are different languages with different rules – M.M Aug 12 '21 at 05:52

2 Answers2

1

The loop is not valid, see the comments by @EricPostpischil .

Yes, the memset approach is valid too. But it is not preferred solution. It operates on individual bytes so it can only ever be reasonably used for zeroing the memory.

C++

  • Value initialization T array[10][10] = {}; zeroes the array of primitives types, calls default constructors for classes.
  • std::fill(p,p+100,value) for assigning a specific value. There is not standard way how to initialize an array to non-zero values without listing them.
  • std::array<T,N> is the preferred way for arrays of known size.

C

cppreference

There is no special construct in C corresponding to value initialization in C++; however, = {0} (or (T){0} in compound literals) (since C99) can be used instead, as the C standard does not allow empty structs, empty unions, or arrays of zero length.

So, in case of nested arrays, use T array[10][10] = {{0}};.

Quimby
  • 17,735
  • 4
  • 35
  • 55
  • 2
    As discussed in the links in the question, the code `int *p = &x[0][0]; for (int i = 0; i < 100; ++i) p[i] = 0;` does not have behavior defined by the C standard. As you note, `p[i]` is defined to be `*(p+i)`, but `p+i` is not defined when it would point to an object outside of `x[0]`. The C standard defines addition to pointers such that `p+i` is defined when the result points to an element from `x[0][0]` to `x[0][10]` (the latter being a notional endpoint that we are allowed to point to, even though the element does not properly exist). Outside of that, the behavior of `+` is not defined. – Eric Postpischil Aug 11 '21 at 19:46
  • @EricPostpischil That's great, then is the custom implementation of memset probably undefined? Also for std::fill; it does not use unsigned char, so probably even more fishy (https://en.cppreference.com/w/cpp/algorithm/fill) – aqjune Aug 12 '21 at 03:19
  • @aqjune `std::fill` is perfectly valid, the not using `unsigned char` is the point. That allows setting the elements to arbitrary values. – Quimby Aug 12 '21 at 06:20
  • @EricPostpischil Okay, I did not know that. But to be pedantic since `p+10` is allowed and "by coincidence" it also point to a first element of the next array, can it be safely incremented? Also do you know of any way how can this break in practice please? – Quimby Aug 12 '21 at 06:22
  • 2
    @aqjune: Using `memset` works (has defined behavior) because `memset` is specified to work (as if) by copying characters, and the C standard gives special treatment to pointers to characters and to using character types as lvalues. C 2018 6.3.2.3 7 says we can convert a pointer to an object (like `&x`) to a pointer to a character type and use it to address the bytes of the object. So `(char *) &x` effectively produces a pointer into an array of all the bytes of `x`, so address arithmetic can be performed with it throughout `x`. The `int *` of `int *p = &x[0][0];` does not enjoy this latitude. – Eric Postpischil Aug 12 '21 at 09:31
  • 2
    @aqjune: That means `memset(&x, 0, sizeof x)` will work, and even `MyMemset(&x, 0, sizeof x)` would work where `MyMemset` is implemented with code that copies character by character. However, the question asks about `memset(&x[0][0], 0, sizeof(x))`. Depending on how one wishes to interpret the C standard, `&x[0][0]` is not properly a pointer to `x`, so maybe `(char *) &x[0][0]` just produces a pointer we can use to access the bytes of `x[0][0]`, not all the bytes of `x`. That is a stricter reading of the standard than most people would use. Practically, I would expect it to work. – Eric Postpischil Aug 12 '21 at 09:36
  • 1
    @aqjune: As for deducing that `p+10` points to `p[1][0]` and therefore can be incremented to point to `p[1][1]`, this is not valid. It uses a transitive property of pointers that is not given by the C standard. The rules for pointer arithmetic arose because not all architectures used flat address spaces in which pointers are simple byte numbers. They had segment-and-offset forms of addressing and other addressing schemes. With segment-and-offset addressing, pointer arithmetic is only valid within the range of the offset. Once you increment beyond that, the address you get is not the next byte… – Eric Postpischil Aug 12 '21 at 09:41
  • 1
    … in memory but is something else because of how the addressing scheme works. That means, to make addressing within an array work, you had to set the segment part correctly so the offset would span all the elements of the array. So `&a[0][i]` might use one segment number for which `i` works from 0 to 10, but maybe &a[0][16]` goes outside the segment bounds and screws up the address. `&a[1][i]` would use a different segment number, so `i` would work from 0 to 10 for it. `&a[0][16]` and `&a[1][6]` would point to different places in memory. – Eric Postpischil Aug 12 '21 at 09:43
  • @EricPostpischil Thank you for the thorough explanation, I edited the answer accordingly. – Quimby Aug 12 '21 at 09:47
  • @aqjune: Such addressing schemes are rare, but modern compilers take advantage of these rules when optimizing. Suppose we passed `&a[0]` to a routine such as the `bool exists_in_table(int (*p)[10]) { for int i = 0; i <= 10; ++i) if (p[0][i]) return true; return false; }`, similar to but changed from the first routine [here](https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633). There is a bug here; `i <= 10` should be `i < 10`. As explained at the link, a compiler may optimize the routine to `return true;`; it can assume `p[0][10]` is anything it wants it to be… – Eric Postpischil Aug 12 '21 at 09:48
  • … It does not have to evaluate `p[0][10]` by loading `p[1][0]` from memory. That is, if it were true that `p[0][i]` accessed the next element in memory of the containing array, then the code in that routine would be defined to check the first eleven `int` elements starting where `p` points. But it is not defined that way, so the code in that routine does not have defined behavior and may be optimized to `return true;`. So you cannot rely on being able to access the elements of `x[10][10]` as an array of 100 `int`. – Eric Postpischil Aug 12 '21 at 09:49
  • @Quimby: You changed the answer to say the loop is not valid in C but is valid in C++. C++ has the same rules for pointers. The `for` loop is not valid in C or C++. – Eric Postpischil Aug 12 '21 at 09:53
  • @EricPostpischil Sorry, you are correct (again), it's even on [cppreference](https://en.cppreference.com/w/cpp/language/operator_arithmetic). Edited the answer (again). – Quimby Aug 12 '21 at 10:06
  • _So, in case of nested arrays, use `T array[10][10] = {{0}};`_ I believe you can also just use `= {0}` instead. – mediocrevegetable1 Aug 12 '21 at 10:11
0

You asked in title:

Can I use memset to initialize a 2 dimensional array?

Yes, as memset(...) only cares about byte count, you could use it like:

int x[10][10];
memset(&x, 0, sizeof(x));

But in C++, an empty initializer-list would do same, like:

int x[10][10] = {};

And in C since C99 (as mentioned in comments), we clould do instead:

int x[10][10] = {{0}};

Anyway, you say:

int q = x[0][10]; // This is an out-of-bounds access

Then asked is following valid:

int *p = &x[0][0];
for (int i = 0; i < 100; ++i)
   p[i] = 0;

Well yes, but only because you do something as simple as setting to zero, a more complex logic could require you to loop 2 dimensionally, like:

for (int x = 0; x < 10; ++x) {
    for (int y = 0; y < 10; ++y) {
        myArray[x][y] = something_more_complex_here;
    }
}
Top-Master
  • 7,611
  • 5
  • 39
  • 71
  • 3
    `int x[10][10] = {{0}};` works fine in C since C99. `memset(&x, 0, sizeof x);`, without `()` is [drier](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). – chux - Reinstate Monica Aug 11 '21 at 19:11
  • *`memset(&x, 0, sizeof x);`, without () is drier.* SInce there are cases where parenthesis are required on `sizeof` to get proper results, that means code will have two styles of using `sizeof`. Thus making code that much harder to understand. If you ***always*** use the same style for a code construct, it's much easier and faster to understand. Meaning it's much less bug-prone. Use parenthesis on `sizeof()`. – Andrew Henle Aug 11 '21 at 19:32
  • And you [even have to put parenthesis on `sizeof( type-name )` such as `sizeof( void * )` or it's a syntax error](https://port70.net/~nsz/c/c11/n1570.html#6.5.3p1). – Andrew Henle Aug 11 '21 at 19:42
  • @AndrewHenle I agree with you, but I don't think you'll win this battle. There's a class of C programmers for whom `sizeof(variable)` versus `sizeof variable` is an absolute [shibboleth](https://en.wikipedia.org/wiki/Shibboleth). – Steve Summit Aug 11 '21 at 19:44
  • @SteveSummit Maybe they should spend some time thinking about how bugs get created and propagated, and how human minds perceive and understand code, and write code in patterns that make the mind's job easier and help prevent the creation of bugs in the first place... – Andrew Henle Aug 11 '21 at 19:54
  • Hmm... I personally was quite surprised the day I noticed that parenthesis are not a required must for `sizeof` and was quite disappointed in the language design because types like `void *` cause issues (and now I had to manually ensure parenthesis are never missing) – Top-Master Aug 11 '21 at 20:50
  • @chux-ReinstateMonica Anyway, assume above with it's `sizeof` gets put into a macro someday (by a genius), and `x` is the Macro's parameter (I am sure you get the reason of always-parenthesis). – Top-Master Aug 11 '21 at 20:58
  • 1
    `sizeof(object)` or `sizeof object`: best to follow your group's coding style guide - whatever that is. – chux - Reinstate Monica Aug 11 '21 at 21:40
  • @Andrew Henle Sometimes arithmetic operations need parentheses to force a certain order of operations. That doesn't mean that you should use them even when they're unnecessary. Furthermore, I use the parentheses as a visual cue that the code is likely to be brittle since you should almost always prefer `sizeof variable` over `sizeof (type)`, and that also prevents bugs. – jamesdlin Aug 12 '21 at 09:55
  • @jamesdlin So what does `sizeof variable + anothervariable` mean? How does leaving out the parenthesis avoid bugs there? Yeah, it doesn't. – Andrew Henle Aug 12 '21 at 09:58
  • @AndrewHenle If you think that `sizeof variable + anothervariable` expression might be evaluated as `sizeof (variable + anothervariable)`, then there's no reason why you'd think that `sizeof (variable) + anothervariable` wouldn't be `sizeof ((variable) + anothervariable)`. – jamesdlin Aug 12 '21 at 10:05
  • _`int x[10][10] = {{0}};`_ I dont think you don't need the extra set of braces, I'm pretty sure `= {0}` will work just fine as well. – mediocrevegetable1 Aug 12 '21 at 10:15
  • @jamesdlin You missed my point: if you want `sizeof( variable + anothervariable )` or anything similar, you ***must*** have parenthesis as in `sizeof()`. Likewise, to get the size of a type, you ***must*** use `sizeof( type )`. Since parenthesis are ***required*** in some situations, the ***only*** consistent usage of `sizeof()` is to use parenthesis all the time. But hey, if you want to have to mentally parse an additional style of code and make your job that much harder and your code that much more bug-prone, that's on you. – Andrew Henle Aug 12 '21 at 16:14
  • As I said, well-written C code rarely should need to use `sizeof (type)`. Doing `sizeof (variable + anothervariable)` is no different than using parentheses with any other unary operator (e.g. `-(variable + anothervariable)`). Ultimately all cases that *need* parentheses with `sizeof` should be rare and suspicious, and I consider drawing attention to them with extra mental effort to be a good thing. – jamesdlin Aug 12 '21 at 20:33
  • Once upon a time I spent hours wondering why `sizeof T` did not compile, in the end the compiler was expanding the `T` template-arg without considering `sizeof` (and guess what, if that person did use parenthesis, I would not need to add them myself then recompile). – Top-Master Aug 12 '21 at 20:42
  • Just because something is allowed by standard and works for you, it does not mean that it works on every compiler and for everyone else (which is yet another reason for using parenthesis). – Top-Master Aug 12 '21 at 20:46