3

I was working and was considering using a union. I decided against it, because the design really called for a struct/class, but it eventually lead to the following hypothetical question:

Suppose you have a union like this contrived example:

typedef union {
    char* array_c;
    float* array_f;
    int* array_i;
} my_array;

. . . and then you allocate one of the arrays and try deleting it from somewhere else:

my_array arr;
arr.array_f = (float*)(malloc(10*sizeof(float)));
free(arr.array_i);

I assume that this would work, although it is technically not defined, because of the way malloc is implemented. I also assume it would work when allocating array_c, even though, unlike int vs. float, the arrays are unlikely to be the same size.

The test could be repeated with new and delete, which are similar. I conjecture these would also work.

I'm guessing that the language specifications would hate me for doing this, but I would expect it would work. It reminds me of the "don't delete a new-ed pointer cast to void* even when it's an array not an object" business.

So questions: what does the specification say about doing this? I checked briefly, but couldn't find anything that addresses this case in particular. How ill-advised is this anyway--from a functional perspective (I realize that this is terrible from a clarity perspective).

This is purely a curiosity question for pedantic purposes.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
geometrian
  • 14,775
  • 10
  • 56
  • 132
  • "the arrays are unlikely to be the same size" -- that's irrelevant since the size isn't passed to free; it figures it out from information stored in the arena. All that is relevant is whether the pointer value passed to free is the same as the pointer value returned by malloc ... and the standard doesn't guarantee that. – Jim Balter Jun 29 '12 at 05:12
  • Exactly. That's why I assumed it would work. – geometrian Aug 03 '13 at 15:55
  • You assumed it would work because the standard doesn't guarantee that it would work? I suggest that you thoroughly read the answers and comments on this page. If you only interest is pedantic, then the pedantic answer is that it's undefined behavior. – Jim Balter Aug 05 '13 at 07:21
  • No; I assumed it would work _although_ the standard doesn't guarantee it would work. David Schwartz's answer described all the issues, which is why I accepted it. – geometrian Aug 11 '13 at 18:23

3 Answers3

2

You're precisely correct. It breaks the rules:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values, but the value of the union object shall not thereby become a trap representation.
    - ISO/IEC standard 9899, section 6.2.6.1

However, the way implementations are typically done, it will "accidentally" work properly. Since free takes a void *, the parameter will be converted to a void * to pass to free. Since all the pointers are located at the same address and all the conversions to a void * involve no change to their value, the ultimate value passed to free will be the same as if the correct member was passed.

Theoretically, an implementation could track which member of a union was accessed last and corrupt the value (or crash the program, or do anything else) if you read a different member from the one you last wrote. But to my knowledge, no actual implementation does anything like that.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • What rule is broken in this situation? It seems to me that this does not rely on any undefined behavior? – dsharlet Jun 29 '12 at 04:04
  • @dsharlet You aren't allowed to access a different member of a union than the one you set. Doing so invokes undefined behavior. – Antimony Jun 29 '12 at 04:05
  • The rule talks about bytes outside of member being set; there are no such bytes in the OP case - all pointers are of the same size. I think that rule 6.5.2.3.5 applies to his post: "One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible." I am not sure if it applies to pointers, though. – Sergey Kalinichenko Jun 29 '12 at 04:18
  • 1
    @dasblinkenlight: That applies to pointers, but doesn't apply to this case. There is no common initial sequence. Also, your interpretation would make that section baffling -- what would happen to the parts that do overlap? In any event, the problem is really not so much that 6.2.6.1 is violated but that there's no particular reason you should get any particular answer. The implementation is allowed to use different formats for pointers to different types and different rules for how to cast them to `void *`. – David Schwartz Jun 29 '12 at 04:23
  • @dasblinkenlight Since pointers aren't structures, 6.5.2.3.5 certainly does not apply. This is undefined behavior just as if these pointers weren't part of a union -- you can't store into one and expect to read that value from another. Nothing in the standard says they must be so aliased. – Jim Balter Jun 29 '12 at 04:43
  • 1
    "The implementation is allowed to use different formats for pointers" -- indeed. 6.2.5 explicitly says that pointers to types other than unions need not have the same representation or alignment requirements. – Jim Balter Jun 29 '12 at 04:49
  • "Theoretically, an implementation could track which member of a union was accessed last" -- I believe there are implementations of C interpreters that keep track of which member was stored into. – Jim Balter Jun 29 '12 at 04:51
  • @JimBalter "_there are implementations of C interpreters that keep track of which member was stored into._" but it isn't as simple the Pascal variant type, because for union of structures (not relevant to this example), you are allowed to inspect the common initial part. The checking code will be tricky. – curiousguy Jul 18 '12 at 03:25
  • @dasblinkenlight "_it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible." I am not sure if it applies to pointers, though._" It doesn't apply to pointers with different types. – curiousguy Jul 18 '12 at 03:28
2

This is undefined behavior because you are accessing a different member than you set. It can do literally anything.

In practice, this will usually work, but you can't rely on it. Compilers and toolchains are not deliberately evil, but there have been cases where optimizations interacted with undefined behavior to produce completely unexpected results. And of course if you're ever on a system with a different malloc implementation, it will probably blow up.

Antimony
  • 37,781
  • 10
  • 100
  • 107
  • "It can do literally anything" -- Undefined behavior simply means that the standard doesn't mandate what a conforming implementation must do. The reality is that your programs can always "do anything" within the capabilities of the hardware, because the standard is not physical law. " if you're ever on a system with a different malloc implementation" -- what does that mean? Different from what? The implementation of malloc isn't relevant here; the only thing relevant is whether the pointer members are exact aliases (location, representation, alignment) in the implementation. – Jim Balter Jun 29 '12 at 04:58
  • 1
    @JimBalter: *"The reality is that your programs can always "do anything" within the capabilities of the hardware, because the standard is not physical law."* -- That's stupid. This is an answer to a question with the C and C++ tags. One should assume that in any statements about what is and is not allowed, there is an implied "as far as the C (or C++) standard is concerned". Please don't require us to spell that out every time. – Benjamin Lindley Jun 29 '12 at 05:10
  • 1
    I've flagged your comment and will not address your misunderstanding of what I wrote. – Jim Balter Jun 29 '12 at 05:15
  • 1
    @JimBalter: So, if I misunderstood you, what did you take exception to when Antimony said "It can literally do anything"? Because I took it to mean that you had a problem with him not qualifying it as "It can literally do anything, as far as the standard is concerned." -- Am I wrong? – Benjamin Lindley Jun 29 '12 at 05:24
  • The standard imposes requirements that must be met for an implementation to qualify as conforming. The definition of undefined behavior is "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements". Talk about "it can do literally anything" is a category error; it misconstrues and misleads about the relationship between standards, implementations, and program behavior. – Jim Balter Jun 29 '12 at 05:33
  • 2
    The statements are close enough to the same meaning, yours is just more wordy, and harder to decipher the meaning of. – Benjamin Lindley Jun 29 '12 at 05:44
-1

It has nothing to do with the malloc() implementation. The union in your example uses the same memory location to store one of three "different" pointers. However, all pointers, no matter what they point to, are the same size - which is the native integer size of the architecture you're on - 32 bits on 32-bit systems and 64-bits on 64-bit systems, etc. This is because a pointer is an address in memory, which may be represented by an integer.

Let's say your arr is located at address 0x10000 (the pointer to your pointer, if you will.) Let's say malloc() finds you a memory location at 0x666660. You assign arr.array_f to this value - which means you store the pointer 0x666660 in the location 0x10000. Then you write your floats into 0x666660 to 0x666688.

Now, you attempt to access arr.array_i. Because you're using a union, the address of arr.array_i is the same as the address of arr.array_f and arr.array_c. So you are reading from the address 0x10000 again - and you read out the pointer 0x666660. Since this is the same pointer malloc returned earlier, you can go ahead and free it.

That said, attempting to interpret integers as text, or floating point numbers as integers, etc, will clearly lead to ruin. If arr.array_i[0] == 1, then arr.array_f[0] will definitely not == 1 and arr.array_c[0] will have no bearing on the character '1'. You can try "viewing" memory this way as an exercise (loop and printf()) - but you won't achieve anything.

lyngvi
  • 1,312
  • 12
  • 19
  • But there is a problem in that `char*` and `float*` does not have to be the same size. See [Are all data pointers of the same size in one platform](http://stackoverflow.com/questions/1241205/are-all-data-pointers-of-the-same-size-in-one-platform). – Bo Persson Jun 29 '12 at 10:22
  • @BoPersson: They don't have to be. But it's hard to imagine how `free` could work on such a system, unless casting to a `void *` did something very strange. – David Schwartz Jun 29 '12 at 12:04
  • 3
    @David - Somewhat unusual, but word-addressed machines can have one size pointers for `char*` and `void*` and another size for `int*` and `float*`. This goes against *"However, all pointers, no matter what they point to, are the same size."* in this answer. Some of us also remember that 16-bit Windows didn't have all 16-bit pointers. – Bo Persson Jun 29 '12 at 12:33
  • Yeah, yeah, 16-bit x86 used 20-bit pointers based on a 16-bit segment offset << 4 plus another offset. I foolishly assumed we were talking about systems less than 20 years old. – lyngvi Jun 29 '12 at 22:12
  • Hmm. Never seen a system with that architecture, but maybe there're some micros out there that do that. In which case, the pointers would still be stored in the same location (it is a union), but might be of different sizes - and the stdlib malloc/free implementations would not be usable, as malloc() would not have the type information needed to choose which memory bus to use. But yeah, the "view" trick wouldn't work there. – lyngvi Jun 29 '12 at 22:20