8

Given the following contrived example code:

struct abc
{
    int x[5];
    int y[5];
};

void main()
{
    struct abc test;
    test.y[0] = 10;
    printf("%n", test.x[5]);
}

The output of the program is 10.

While not the best programming practice, this does work. However, is this an artifact of the compiler and platform, or is this legal code? (i.e. defined by the C standard?)

Even if the result is not guaranteed to be 10, is there ever an instance where this would be "illegal" (i.e. writing to memory I do not "own")?

David Pfeffer
  • 38,869
  • 30
  • 127
  • 202
  • 3
    There is a very good chance that the same code is going to break if you try it with `char` instead of `int`, and keep an odd size of the array. – Sergey Kalinichenko Dec 13 '11 at 14:40
  • 6
    `void main()` is **not valid C or C++**. – Kerrek SB Dec 13 '11 at 14:50
  • Let me ask another question: why on earth would you think you **need** to do something like that ? – ereOn Dec 13 '11 at 14:56
  • 1
    @KerrekSB: And yet many, many embedded environments expect a `void main(void)` where the return value of `main` has no meaning because programs are meant to run indefinitely after they are started. – Brian McFarland Dec 13 '11 at 15:34

4 Answers4

12

No, it's not legal nor guaranteed to work. The compiler could add padding into the struct, to aid in alignment, depending on the architecture, etc.

Edit: To sum up some of the stuff in these comments and clarify...

I do believe you "own" the memory there, since as edA-qa mort-ora-y points out, memcpy() of a struct needs/is expected to work. Where this is specifically guaranteed though, I'm not sure.

That being said, undefined behavior is something to avoid at all costs. What a program with undefined behavior does could change between two separate runs of the same code five seconds apart. It could cause subtle memory corruption in your program, a segfault, or run just fine, but there's no reason to ever use code that relies on undefined behavior.

Dan Fego
  • 13,644
  • 6
  • 48
  • 59
  • It's legal, just not defined behavior. – Pubby Dec 13 '11 at 14:39
  • This also violates aliasing rules. – Oliver Charlesworth Dec 13 '11 at 14:41
  • 6
    @Pubby: What's your definition of "legal" (*"won't land you in jail"*?) – NPE Dec 13 '11 at 14:42
  • 1
    I do "own" the entire block of memory though, correct? i.e. there's nothing illegal about writing `test.x[5] = 10`, even if there's no guarantee that the value will appear at `test.y[0]`? I'm trying to distinguish between legal and a very, very bad idea. – David Pfeffer Dec 13 '11 at 14:42
  • @aix If it were strictly illegal then the compiler should choke on it. Although I suppose since it's UB it could go either way. – Pubby Dec 13 '11 at 14:46
  • Related reading: [Common gotchas when writing your own p/invoke](http://blogs.msdn.com/b/oldnewthing/archive/2009/08/13/9867383.aspx) – GSerg Dec 13 '11 at 14:46
  • 1
    @DavidPfeffer: what people meant here, is that even if you own the memory block, there is **no guarantee** about what you will read using this trick. – ereOn Dec 13 '11 at 14:48
  • @Pubby: Writing `test.y[10] = 10;` would be illegal though, causing a segfault, but will compile. – David Pfeffer Dec 13 '11 at 14:48
  • 1
    @DavidPfeffer: It will not necessarily cause a segfault. Anything could happen. This is what undefined behavior means and this is why you should avoid it. – ereOn Dec 13 '11 at 14:49
  • 3
    I no longer think in C++11 this is undefined behaviour. This is a standard layout type and that means copying via memcpy is defined, thus there must be some guarantee about the memory in the padding space. Though I completely agree that value is undefined and has no guarantee about rolling over into the next member. But it must be safe to access the memory located there due to the standard layout requirements. – edA-qa mort-ora-y Dec 13 '11 at 14:59
  • @edA-qamort-ora-y: Nice to know. My concern is: perhaps it **is** safe in C++11, but is it really a good idea anyway ? I really fail to see why someone would need to do that. – ereOn Dec 13 '11 at 15:08
  • @ereOn, I would assume curiosity. Just wait 'till the OP finds out he can also modify his own instructions ;) – Vladislav Zorov Dec 13 '11 at 15:19
  • I think that `test.x[5] = 10` is UB. `test.x+5` is the one-off-the-end pointer, and accessing through a one-off-the-end pointer is UB, regardless of whether you "own" the memory or not. However, if `test.x+5 == test.y` on your particular implementation, which most likely is publicly documented in the struct layout rules, then by good fortune you are using a valid pointer value after all, even though it was obtained in a way not guaranteed to produce a pointer you can access as an `int*`. OTOH `*(unsigned char*)(test.x+5) = 10` *is* allowed precisely because you own the memory. – Steve Jessop Dec 13 '11 at 15:19
  • Or to put it another way, @edA-qamort-ora-y: the fact that you can memcpy POD/standard layout objects means it's safe to access using `unsigned char*`, and also using `char*` in C++. I don't think it's guaranteed safe to access using `int*`, but I also don't expect it to fail to work on any "normal" implementation. – Steve Jessop Dec 13 '11 at 15:22
  • For C99, the only case that this can lead to UB is when the bit pattern stored at `x[5]` is a trap representation for `int`. So first, your platform has to have such trap (I don't know of any commodity architecture that has this) and then the corresponding bit must be uninitialized for some reason; either because you have padding between the members or because you didn't initialize `y[0]`. So the question if it is "legal" is more or less theoretical. More important is the question if this gives you the element that a naive reader may think it does: No, it doesn't, so it is extremely dangerous. – Jens Gustedt Dec 13 '11 at 15:35
  • @JensGustedt Maybe my definition of UB is wrong, but if we have some padding there we may read completely different values every time we run the program. That sounds very much like undefined behavior to me. – Voo Dec 13 '11 at 16:45
  • @Voo, no it is not UB but the value is unspecified. These are two different things in C. UB means that anything can happen, from eating your harddrive to emtptying your bank account. Unspecific value means just that: you can have any value in the variable. This is why such things can go without notice for a long time. Things that are crashing the application immediately are much nicer :) – Jens Gustedt Dec 13 '11 at 16:54
  • @Jens Where does the standard distinguish between "value is unspecified" and "this is UB"? I mean signed overflow is UB although it's rather similar to this situation. – Voo Dec 13 '11 at 17:32
  • The standard (C99) distinguishes between "undefined behavior" (3.4.3) and "unspecified behavior" (3.4.4). The later explicitly mentions "use of an unspecified value". And no an overflow is something different. If you start with a positive `int` value and only increment, e.g., a compiler is allowed to assume that the value will always be positive and may do optimizations in that sense. If the value is unspecific, the compiler can't and shouldn't assume anything. – Jens Gustedt Dec 13 '11 at 17:55
5

This is undefined behaviour - you are (un)lucky in this case. Further more (aside from the mentioned padding issue), there is the maintainability issue - it's incredibly fragile - what if someone sticks something else in between. I'm sure it's a contrived example, but the recommendation is - don't do it.

Nim
  • 33,299
  • 2
  • 62
  • 101
4

EDIT: As pointed out by others, this is not legal, as it results in undefined behaviour. I've removed this sentence from my answer.

This has the potential to result in undefined behaviour. You've allocated a memory chunk of 10 ints long in the struct abc, so indexing into the 5th (6th) item will take you to y[0] as you've noted in THIS specific case.

Where you can run into problems is when the C compiler packs the structure in a way that you do not expect. This is called data packing or bit alignment. When the computer wants to access memory from your data structure, it will attempt to do so in uniform chunks for the entire structure. Let's use an example:

struct abc {
    int a;
    char b;
    int c;
};

What do you expect the size of this struct to be? An int is 32 bits, and a char is 8 bits, so the total size should be 32 + 8 + 32 = 72 bits. However, you will find that on many systems, this structure is actually 96 bits in size. The reason is that char b gets bit packed on the end with an additional 24 bits to maintain a standard offset between variables.

This can be extremely confusing when you declare a structure in two different places, and one gets bit packed while the other does not due to compile time options or configuration.

Look up bit packing and data alignment or bit alignment for more information.

Brett McLain
  • 2,000
  • 2
  • 14
  • 32
  • 3
    @DavidPfeffer: I'm not sure accepting the only answer that doesn't mention "undefined behavior" was the right thing to do. Surely, this anwser seems to comfort you in your beliefs, but really you shouldn't do it. – ereOn Dec 13 '11 at 14:53
  • 3
    It's *not* legal, in the sense that it gives undefined behaviour. – Mike Seymour Dec 13 '11 at 14:53
  • To paraphrase: `This is legal in the sense that it gives undefined behavior and may print everything` - well in my book that's the complete opposite of legal. – Voo Dec 13 '11 at 15:00
  • I've edited my answer to reflect the fact that this is indeed illegal in that it results in undefined behaviour. – Brett McLain Dec 13 '11 at 15:11
  • 2
    The C++11 standard barely uses the word "legal", and doesn't define it. In every case where it does use it (that I can find), the thing it is describing as "illegal" is rejected by the compiler, or the thing it describes as "legal" is accepted and has defined behavior. So take your pick what you think it should mean, but for preference use standard terminology ("well-formed", "ill-formed", "UB") – Steve Jessop Dec 13 '11 at 15:28
  • Hmmm I think I agree with Steve Jessop. If you try to access memory that is not "yours", i.e. smash the stack, then the compiler or the OS itself will likely complain if you have buffer overflow protection turned on. – Brett McLain Dec 13 '11 at 15:37
3

Technically the behavior is undefined.

While not the best programming practice, this does work.

Undefined behavior means anything can happen including what you expect. It might well crash on other implementations.

Prasoon Saurav
  • 91,295
  • 49
  • 239
  • 345
  • 3
    More important (and a more convincing argument) than potential "crashing" is the fact that the compiler may make assumptions about your code when optimizing which you are violating. – Kerrek SB Dec 13 '11 at 14:50