53

First, to clarify, I am not talking about dereferencing invalid pointers!

Consider the following two examples.

Example 1

typedef struct { int *p; } T;

T a = { malloc(sizeof(int) };
free(a.p);  // a.p is now indeterminate?
T b = a;    // Access through a non-character type?

Example 2

void foo(int *p) {}

int *p = malloc(sizeof(int));
free(p);   // p is now indeterminate?
foo(p);    // Access through a non-character type?

Question

Do either of the above examples invoke undefined behaviour?

Context

This question is posed in response to this discussion. The suggestion was that, for example, pointer arguments may be passed to a function via x86 segment registers, which could cause a hardware exception.

From the C99 standard, we learn the following (emphasis mine):

[3.17] indeterminate value - either an unspecified value or a trap representation

and then:

[6.2.4 p2] The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime.

and then:

[6.2.6.1 p5] Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

Taking all of this together, what restrictions do we have on accessing pointers to "dead" objects?

Addendum

Whilst I've quoted the C99 standard above, I'd be interested to know if the behaviour differs in any of the C++ standards.

Community
  • 1
  • 1
Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
  • 3
    You cited the Standard in an excellent manner - from those words, it's clear to me that using an invalid pointer in any way, even without dereferencing it, invokes undefined behavior. –  Jun 10 '13 at 13:20
  • I don't see where this should come from. As long as you pass the pointer around, nothing is happening. of course it is bvious, that it doesn't make sense, because you can not use this pointer anyway, but passing it around is virtually the same as having an uninitialized pointer. – Devolus Jun 10 '13 at 13:29
  • 1
    @Devolus: Yes, that was my intuition too. But the standard seems relatively unambiguous. And AProgrammer made a good point (in the linked discussion), that if segment registers get involved, this really could lead to an HW exception. – Oliver Charlesworth Jun 10 '13 at 13:31
  • @Devolus, what we're trying to understand is: "is passing it around safe?" – Shahbaz Jun 10 '13 at 13:31
  • `free` does not modify its argument. The pointer passed to `free` still points to the same location afterwards. The call to `free` simply informs the standard library that the object is no longer 'in use' and the storage at that location can be re-used. This is not the same as the object 'reaching the end of its lifetime', which occurs for objects on the stack. – willj Jun 10 '13 at 13:40
  • 3
    @willj: That's correct. But nevertheless, the standard tells us that the pointer is now indeterminate. – Oliver Charlesworth Jun 10 '13 at 13:42
  • C++ recently made this implementation-defined, see [DR 1438](http://www.open-std.org/JTC1/SC22/WG21/docs/cwg_defects.html#1438), because it won't actually trap on all systems – Jonathan Wakely Jun 10 '13 at 13:45
  • The pointer is indeterminate if the object has reached the end of its lifetime.. where does it say that 'free' causes an object to 'reach the end of its lifetime'? As I can roll my own implementation of `malloc` and `free`, I guess that an implementation is not permitted to give them special treatment. – willj Jun 10 '13 at 13:45
  • 1
    "Rolling your own" `malloc` and `free` invokes undefined behavior already. 7.1.3: "If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined." – R.. GitHub STOP HELPING ICE Jun 10 '13 at 13:46
  • @Oli: ah, then I stand corrected ;) – willj Jun 10 '13 at 13:49
  • @R..: I meant that I can roll my own `customMalloc()` and `customFree()` - in which case object lifetime would be unaffected. – willj Jun 10 '13 at 13:51
  • 3
    @willj, it's not about modifying that value. Most probably the pointer still has the same value. However, if that value gets copied somewhere, it may pass through a special pointer register (e.g. segment register in x86) where the hardware could cause a trap due to the pointer being invalid. – Shahbaz Jun 10 '13 at 14:03
  • @Oli Charlesworth I think you are reading things "between the lines" a bit. The standard tells that a pointer is indeterminate if the object pointed at reaches the end of its life time. But this is cited from 6.2.4, the chapter about storage duration. One may argue and say that the cited text only refers to a pointer to an object that has reached the end of its scope, since that chapter starts by stating `"Allocated storage is described in 7.22.3"`. In other words, allocated storage is a special case where 6.2.4 doesn't necessarily apply. – Lundin Jun 10 '13 at 14:20
  • But unfortunately, there's no useful information in 7.22.3 regarding the topic, or what happens with a pointer when you pass it to free() - whether it is formally turning indeterminate or not. – Lundin Jun 10 '13 at 14:21
  • @Lundin: Hmm, that's not how I interpret it. I don't see allocated storage as a special case, it's simply described in a separate section for convenience. However, if your interpretation is correct, we could simply rewrite both my examples to use pointers to automatic objects that have died... – Oliver Charlesworth Jun 10 '13 at 14:26
  • @OliCharlesworth It far from obvious how to interpret it. After a second reading of C11 6.2.4 I found that the chapter defines the lifetime for static and automatic objects (and for thread storage in C11), but not for "allocated" ones. Yet in C11 7.22.3, there is a sentence stating: `The lifetime of an allocated object extends from the allocation until the deallocation.` That line seems to go well together with the text you cited from 6.2.4. – Lundin Jun 10 '13 at 14:34
  • @Lundin: It's when the object has reached the end of its *lifetime*, not (necessarily) the end of it's *scope*. (Scope is a region of program text over which an identifier is visible.) – Keith Thompson Jun 10 '13 at 17:39
  • @OliverCharlesworth can I suggest changing this to a C question? Since C and C++ are considerably different in this area , this question would get confusing if C++ answers were added. There could be a different thread made for the C++ version. (The existing C++ answer that has been posted actually doesn't answer the question at all) – M.M Jun 07 '15 at 14:57
  • @MattMcNabb: Sure, if you like. The C++ part of the question was only ever added as an addendum... – Oliver Charlesworth Jun 07 '15 at 14:58
  • @MattMcNabb Jonathan Wakely already mentioned DR 1438. Non-dereference use of invalid pointers: "The current Standard says that any use of an invalid pointer value produces undefined behavior (3.7.4.2 [basic.stc.dynamic.deallocation] paragraph 4). This includes not only dereferencing the pointer but even just fetching its value." Nothing to add here. – curiousguy Jun 07 '15 at 15:04
  • @curiousguy C++ doesn't clearly define what an invalid pointer is ; the amount of discussion generated on [this question](http://stackoverflow.com/questions/30694069/is-it-legal-to-compare-dangling-pointers/30694084) suggests that it is not so simple – M.M Jun 07 '15 at 15:12

3 Answers3

31

Example 2 is invalid. The analysis in your question is correct.

Example 1 is valid. A structure type never holds a trap representation, even if one of its members does. This means that structure assignment, on a system where trap representations would cause problems, must be implemented as a bytewise copy, rather than a member-by-member copy.

6.2.6 Representations of types

6.2.6.1 General

6 [...] The value of a structure or union object is never a t rap representation, even though the value of a member of the structure or union object may be a trap representation.

Community
  • 1
  • 1
  • Ah, that's interesting. I hadn't noticed that clause. Thanks! – Oliver Charlesworth Jun 10 '13 at 13:47
  • Since the issue isn't trap representations but indeterminate values, I don't think the issue is resolved by the cited text. Per J.2 (albeit non-normative), UB results if "The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.8, 6.8)." However, perhaps in this case it is the value of the member, not the value of the structure, that is indeterminate, in which case the value of the object with indeterminate value is not used. – R.. GitHub STOP HELPING ICE Jun 11 '13 at 01:17
  • @R. J.2 is outdated. The normative text (of C99, anyway) only disallows reading objects that hold trap representations. If they are indeterminate but cannot hold trap representations, reading is allowed. This is important for, for example, `unsigned char` too. –  Jun 11 '13 at 06:49
  • @R.. There's [DR 338](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm) that is supposed to tighten the rules somewhat again, but I don't see it in a draft of C11 (perhaps it was included after the last public draft), so I'm not sure how that affects my answer here. –  Jun 11 '13 at 06:56
  • @hvd: That seems more like how a standard should be written, though I wish writers would further specify that the *existence* of trap representations *or things that behave like them* must be implementation-defined, though the *consequences* need not be. – supercat Apr 28 '15 at 19:25
  • @supercat Any implementation where `malloc` can succeed for a size of at least 2 must have trap representations: add one byte to `malloc`'s result, and you get a pointer that is not allowed to compare equal to any pointer value that was valid just before `malloc` was called. Because of that, before `malloc` was called, that representation was a trap representation. –  Apr 28 '15 at 20:20
  • @hvd: The uses I have seen for the term "trap representation" imply a value which, when read as an rvalue, will disrupt normal program flow in some fashion which would hopefully be recognizable as a trap, but whose particulars are beyond the scope of the C standard. Basically, what I would like to see would be for the Standard to say that an implementation should have to specify under what cases the statement `p=q;` (given unaliased variables `p` and `q` of the same type (any type)) might do anything other than make `p` hold a value which is at least as well defined as what's in `q`. – supercat Apr 28 '15 at 20:32
  • @supercat The standard has a very specific definition for a trap representation: it's a representation that doesn't represent a value. C99 6.2.6.1p5: "Certain object representations need not represent a value of the object type. [...] Such a representation is called a *trap representation*." You mean something else by it. Anyway, as of C11, reading indeterminate values is mostly undefined again, even if the type has no trap representations, so it wouldn't get you much. –  Apr 28 '15 at 20:41
  • @hvd: My point is that there are a lot of cases where it would be acceptable for code which is given invalid data to interpret it as arbitrary gibberish, and in some such cases would also be acceptable for the program to trap in recognizable fashion, but where adherence to the laws of causality is required in any case. The *reason* that reading a trap representation was defined as Undefined Behavior, rather than merely yielding an unspecified value was to allow for the possibility that such accesses might disrupt program flow in ways outside the scope of the Standard. – supercat Apr 28 '15 at 20:49
  • 1
    @supercat [Analyzability](http://en.cppreference.com/w/c/language/analyzability) may be of interest for that (but not for your earlier comments). As of C11, an implementation can define `__STDC_ANALYZABLE__` to indicate that the effects of undefined behaviour are limited, except for critical undefined behaviour. And reading trap representations is not critical undefined behaviour: if `__STDC_ANALYZABLE__` is defined, it may cause the program to abort, but it may not completely corrupt the execution of the program. –  Apr 28 '15 at 20:51
  • @hvd: Thanks a million for that; I wonder why I've not seen it mentioned anywhere before? If code can safely use a constraint handler to longjmp back to sanity, that's a major help to many optimizations. IMHO, having a program require analyzability would seem like it could in many cases enable much more useful optimizations than would be enabled by letting compilers go crazy. Being able to specify non-trapping could help in a few more cases, e.g. `uint32_t x,y,z; ... x=y*z;` could fail on systems where `int` is 33-64 bits, but disabling traps would make sane implementations "just work". – supercat Apr 28 '15 at 21:06
  • @supercat I don't know if any implementations support it, and that may be why I've only rarely seen it mentioned either. As for `x=y*z;`, I've suggested `x=1U*y*z;` in the past if an implementation is found where `uint32_t` exists and promotes to a signed type. Yeah, it's ugly, and it really shouldn't be necessary, but if you want to support common compilers like GCC (known to optimise aggressively), you will end up needing something like that anyway. –  Apr 28 '15 at 21:12
  • @hvd: Too bad people are working harder to break analyzability than support it. A couple abilities analyzability still doesn't seem to provide, but most implementations could in practice provide if trapping were bypassed would be (1) determine whether `realloc` has moved an allocation (in general, comparisons between live and dead pointers cannot be expected to be meaningful, but in this case it should) (2) given two pointers which have not been modified since they pointed to the same live object, report the what displacement was (in units of `char*`) when the object was alive. – supercat Apr 28 '15 at 21:33
  • @hvd: The above operations should not access unowned memory, and while ideally all operations which would *produce* an invalid pointer without special "permission" would be trapped, neither operation produces any kind of pointer. As such, even though they involve dead pointers, it should be possible for any platform to perform them safely by, at worst, using `memcpy` to copy the pointers to a suitably-sized `char[]`, shuffling any bits as required to yield an integer that can be used for the comparison or subtraction (a library macro could exploit UB to do such things faster, though). – supercat Apr 28 '15 at 21:46
15

My interpretation is that while only non-character types can have trap representations, any type can have indeterminate value, and that accessing an object with indeterminate value in any way invokes undefined behavior. The most infamous example might be OpenSSL's invalid use of uninitialized objects as a random seed.

So, the answer to your question would be: never.

By the way, an interesting consequence of not just the pointed-to object but the pointer itself being indeterminate after free or realloc is that this idiom invokes undefined behavior:

void *tmp = realloc(ptr, newsize);
if (tmp != ptr) {
    /* ... */
}
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    Re "accessing an object ..."; there is a footnote in the standard which I didn't quote above: "*Thus, an automatic variable can be initialized to a trap representation without causing undefined behavior, but the value of the variable cannot be used until a proper value is stored in it.*" It sounds like *writing* to such an object is acceptable. – Oliver Charlesworth Jun 10 '13 at 13:36
  • 3
    @OliCharlesworth, of course it is. Otherwise how can you do something like: `free(x); x = NULL;`? – Shahbaz Jun 10 '13 at 13:37
  • @Shahbaz: Indeed! I'm just having trouble parsing the standard in such a way that it allows this kind of thing ;) – Oliver Charlesworth Jun 10 '13 at 13:37
  • 4
    @OliCharlesworth, I think the part that says: _If the stored value of an object has such a representation and is **read** by an lvalue expression..._, shows that it can be written to, but not read from. – Shahbaz Jun 10 '13 at 14:06
  • 1
    void *tmp = realloc(ptr, newsize); << if realloc does fail, then tmp is valid (NULL) and ptr remains valid as well. This is not UB when tmp==NULL. – jim mcnamara Jun 10 '13 at 15:13
  • 4
    @jimmcnamara: Of course. But it's UB in the success case, which was the point. – R.. GitHub STOP HELPING ICE Jun 10 '13 at 20:18
  • The Standard explicitly guarantees that structures will never have trap representations. I would be hard-pressed to identify any case where that would be meaningful if copying a struct whose value was at least partially indeterminate would have any effect beyond producing a copy whose value might likewise be partially indeterminate. – supercat Oct 03 '17 at 19:47
-1

C++ discussion

Short answer: In C++, there is no such thing as accessing "reading" a class instance; you can only "read" non-class object, and this is done by a lvalue-to-rvalue conversion.

Detailed answer:

typedef struct { int *p; } T;

T designates an unnamed class. For the sake of the discussion let's name this class T:

struct T {
    int *p; 
};

Because you did not declare a copy constructor, the compiler implicitly declares one, so the class definition reads:

struct T {
    int *p; 
    T (const T&);
};

So we have:

T a;
T b = a;    // Access through a non-character type?

Yes, indeed; this is initialization by copy constructor, so the copy constructor definition will be generated by the compiler; the definition is equivalent with

inline T::T (const T& rhs) 
    : p(rhs.p) {
}

So you are accessing the value as a pointer, not a bunch of bytes.

If the pointer value is invalid (not initialized, freed), the behavior is not defined.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
  • Actually an lvalue to rvalue conversion can be done for class lvalues too. The context is when passing a class lvalue through the ellipsis in a function call. – Johannes Schaub - litb Jun 16 '13 at 09:33
  • @JohannesSchaub-litb Yes you can. [conv.lval]"Otherwise, if the glvalue has a class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary" So this conversion is defined in term of the ctor, and we go back to accessing the each member one-by-one, with lvalue-to-rvalue conversion for each one. – curiousguy Jun 16 '13 at 10:35
  • that is correct. At least as far as nonunion class objects are concerned. Unions are copied "bitwise". – Johannes Schaub - litb Jun 16 '13 at 11:20
  • This all has nothing to do with the question except for the last sentence ... which you give no justification for. – M.M Jun 07 '15 at 14:30
  • @MattMcNabb Hug? This has everything to do with the question... I don't know what you are trying to say. – curiousguy Jun 07 '15 at 15:01
  • 1
    The examples in the question are about using a pointer after the space it points to has been freed . In your code you copy an uninitialized pointer, which is different. Also, all the stuff about the class is irrelevant, you could equally well have written `int *a; int *b = a;` – M.M Jun 07 '15 at 15:05
  • "_all the stuff about the class is irrelevant_" all the stuff about the class relates to "Example 1" in the question! – curiousguy Jun 15 '15 at 23:08