11

An blog author has brought up the discussion about null pointer dereferecing:

I've put some counter arguments here:

His main line of reasoning quoting the standard is this:

The '&podhd->line6' expression is undefined behavior in the C language when 'podhd' is a null pointer.

The C99 standard says the following about the '&' address-of operator (6.5.3.2 "Address and indirection operators"):

The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier.

The expression 'podhd->line6' is clearly not a function designator, the result of a [] or * operator. It is an lvalue expression. However, when the 'podhd' pointer is NULL, the expression does not designate an object since 6.3.2.3 "Pointers" says:

If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

When "an lvalue does not designate an object when it is evaluated, the behavior is undefined" (C99 6.3.2.1 "Lvalues, arrays, and function designators"):

An lvalue is an expression with an object type or an incomplete type other than void; if an lvalue does not designate an object when it is evaluated, the behavior is undefined.

So, the same idea in brief:

When -> was executed on the pointer, it evaluated to an lvalue where no object exists, and as a result the behavior is undefined.

This question is purely language based, I'm not asking regarding whether a given system allows one to tamper with what lies at address 0 in any language.

As far as I can see, there's no restriction in dereferencing a pointer variable whose value is equal to nullptr, even thought comparisons of a pointer against the nullptr (or (void *) 0) constant can vanish in optimizations in certain situations because of the stated paragraphs, but this looks like another issue, it doesn't prevent dereferencing a pointer whose value is equal to nullptr. Notice that I've checked other SO questions and answers, I particularly like this set of quotations, as well as the standard quotes above, and I didn't stumbled upon something that clearly infers from standard that if a pointer ptr compares equal to nullptr, dereferencing it would be undefined behavior.

At most what I get is that deferencing the constant (or its cast to any pointer type) is what is UB, but nothing saying about a variable that's bit equal to the value that comes up from nullptr.

I'd like to clearly separate the nullptr constant from a pointer variable that holds a value equals to it. But an answer that address both cases is ideal.

I do realise that optimizations can quick in when there're comparisons against nullptr, etc and may simply strip code based on that.

If the conclusion is that, if ptr equals to the value of nullptr dereferencing it is definitely UB, another question follows:

Do C and C++ standards imply that a special value in the address space must exist solely to represent the value of null pointers?

Community
  • 1
  • 1
oblitum
  • 11,380
  • 6
  • 54
  • 120
  • 1
    How would you obtain a pointer that is equal to the bit pattern of a null pointer but not an explicit null pointer, without invoking some sort of undefined behaviour in the process? – Arkku Feb 17 '15 at 23:31
  • Yes, this must have another answer somewhere. – Iharob Al Asimi Feb 17 '15 at 23:34
  • Is [this question](http://stackoverflow.com/q/5248877/1392132) related? You can traverse the linked list of duplicates if you feel like doing so. The tail node has a pretty good answer. – 5gon12eder Feb 17 '15 at 23:39
  • @5gon12eder I don't have time to read so many related questions ;-) Sometimes it's better to ask with the guts of trying to better target your doubts. – oblitum Feb 17 '15 at 23:40
  • 4
    Pick a language tag; this is quite different in C than in C++. Using `nullptr` suggests you want to ask about C++, but then you quote from the C standard. – M.M Feb 17 '15 at 23:56
  • @MattMcNabb I see (notice that I use `(void *) 0` too), but in truth I like to get answers for both, from both standards. If not this way I would need to split almost indentical questions just because of the tag change. – oblitum Feb 17 '15 at 23:59
  • @Arkku Having a pointer variable set to null pointer value is not UB. Dereferencing a pointer variable may be UB. Stating that `*(int *)(void *)0` (dereferencing the null pointer constant) is UB is not the same as stating that `*ptr` (or `ptr->` or `(*ptr)()` ) is UB if its value is `0`. – oblitum Feb 18 '15 at 00:28
  • @pepper_chico I didn't say that having a null pointer value would be UB (that would be quite absurd =)… But my point was: how would you “legally” obtain a dereferencable pointer data type with the null pointer's bit pattern without explicitly creating a null pointer? There's no way that a valid pointer just happens to have the same bits as a null pointer of that type, because each pointer type must have a distinct null pointer (otherwise everything that needs to be able return a null pointer would break). – Arkku Feb 18 '15 at 01:35
  • @Arkku I was not implying that you were saying that, that's just part of the reasoning in the entire sentence I've written. – oblitum Feb 18 '15 at 02:51
  • Also worth of notice is that even _offsetting_ a null pointer yields UB, which borders on the idea that manipulating invalid pointers are similar to manipulating trap representations themselves (although a null pointer _isn't_ itself a trap representation). – alecov Jan 17 '17 at 02:54

3 Answers3

13

As you quote C, dereferencing a null pointer is clearly undefined behavior from this Standard quote (emphasis mine):

(C11, 6.5.3.2p4) "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.102)"

102): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."

Exact same quote in C99 and similar in C89 / C90.

ouah
  • 142,963
  • 15
  • 272
  • 331
  • AHHH, thanks, 102, it's this one that I was missing, something saying that a variable being assigned to nullptr is invalid when dereferenced. – oblitum Feb 17 '15 at 23:35
  • 1
    @pepper_chico 87) in C99 and 102) in C11. – ouah Feb 17 '15 at 23:36
  • ...but the standard never actually specifies (in any normative text) when a pointer value is "invalid". Presumably it's when it doesn't refer to a function or to a pointer, but in those cases, the sentence you quote is unnecessary, because then the behaviour of `*` is already undefined by omission. –  Feb 17 '15 at 23:37
  • @hvd That's right it's in a footnote. I also take it here as an invalid value for the unary `*` operator as in the Rationale C99 (also non-normative), they also implicitly say an invalid pointer is a pointer that is not null and does not point a proper object or function. – ouah Feb 17 '15 at 23:41
  • Could you address the ending concluding questions? – oblitum Feb 17 '15 at 23:44
  • @pepper_chico I honestly think your last question deserves its own separate question. – ouah Feb 17 '15 at 23:54
  • @ouah it's hard to build the right context for asking those questions, I know SO and sometimes it shows up with too much unrelated questioning about a given question. I may try it later by explicitly referring to this one. – oblitum Feb 17 '15 at 23:57
5

C++

dcl.ref/5.

There shall be no references to references, no arrays of references, and no pointers to references. The declaration of a reference shall contain an initializer (8.5.3) except when the declaration contains an explicit extern specifier (7.1.1), is a class member (9.2) declaration within a class definition, or is the declaration of a parameter or a return type (8.3.5); see 3.1. A reference shall be initialized to refer to a valid object or function. [ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by indirection through a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. — end note ]

The note is of interest, as it explicitly says dereferencing a null pointer is undefined.

I'm sure it says it somewhere else in a more relevant context, but this is good enough.

Blob
  • 561
  • 1
  • 6
  • 19
  • That's nice too, I'm waiting whether someone is going to address the ending questions... – oblitum Feb 17 '15 at 23:48
  • 7
    Notes are non-normative; it can't be inferred from this note that dereferencing a null pointer causes UB – M.M Feb 17 '15 at 23:59
  • This only covers binding a dereferenced null pointer to a reference. It doesn't cover anything more than that, and at least at some point in time, C++ committee members have stated that dereferencing the null pointer by itself (without binding the result to a reference, and without performing any lvalue-to-rvalue conversion) is *not* undefined. If I recall correctly, it's still undecided. –  Feb 18 '15 at 00:03
  • @pepper_chico I'm not sure I understand your question. The compiler changes "0" to any memory address that an object can't be placed in. The only requirements is that null pointers only compare equal to other null pointers. – Blob Feb 18 '15 at 00:05
  • 2
    @pepper_chico [Here you go.](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232) "**Notes from the October 2003 meeting:** [...] We agreed that the approach in the standard seems okay: `p = 0; *p;` is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior." –  Feb 18 '15 at 00:06
  • @hvd thanks, it's shows as such innocent question crops up even in the committe, I guess I'll just stop bothering about the validity of it until it's clearly stated and cleaned up in the standard. – oblitum Feb 18 '15 at 00:11
  • @Blob How does the compiler defines what address in the address space cannot be subject to hold any object or function. Isn't this a system's role? – oblitum Feb 18 '15 at 00:15
  • @pepper_chico Yes. The compiler would have to make "null pointer" point to an invalid memory address depending on what the system allows. The compiler would have to be written specifically for the system. – Blob Feb 18 '15 at 00:18
  • @Blob That's quite non-sense to me, even though it may be what the standard actually implies. This also varies given the level of privilege in a single platform, not just from platform to platform. – oblitum Feb 18 '15 at 00:19
  • 1
    @pepper_chico: There's not even a requirement that the pointer notion of the C++ compiler corresponds to the pointer notion of the OS itself. Therefore the compiler does have the freedom to define the effective address space spanned by _its_ pointer type. Thus the compiler address space can be the union of `nullptr` and the OS'es address space. – MSalters Feb 18 '15 at 12:12
  • @MSalters that should be cumbersome to implement I guess. And surely, there's no implementation like it, although it's an interesting point, from a theorical point-of-view. – oblitum Feb 18 '15 at 16:00
  • @pepper_chico: Probably not for modern C++, but consider C on 8086 with its weird 20+ bit address space (20+ due to A20 weirdness), 32 bits (far) pointers, and a writeable interrupt table at physical `0000:0000`. Even the Pentium was somewhat weird with its 36 bit physical addresses. The AMD Athlon-64 probably was the first same memory design in the x86 family. – MSalters Feb 18 '15 at 16:20
  • @MSalters thanks for your input, I've gone through some of this in my programming history too. I just don't see how this fits your point about a `nullptr` value outside of the address space (that's still may be stored in pointer variables), maybe you're not referring to this point? – oblitum Feb 18 '15 at 23:00
  • @MSalters I do can infer that addressing was quite strange compared to the common usage that this issue may become quite irrelevant in the context of standard's compliance, compared to all the addressing issues as a whole. – oblitum Feb 18 '15 at 23:03
  • @MSalters: Aside from the lack of segment registers and load segment with immediate instructions, the 8086 architecture was quite brilliant; it's too bad languages really didn't support it well, and the 80286 and subsequent machines missed out on what made the architecture really work. If x86 had 32-bit segment registers whose upper few bits (e.g. 4) selected one of 16 address spaces, and logical addresses were formed by shifting the lower 28 bits of the segment register by an amount that was configurable on an address-space basis, then it would be practical to use 32-bit object refs... – supercat Apr 28 '15 at 17:06
  • @MSalters: ...to access terabytes' worth of objects (use 4/16ths of the address space for 16GB worth of objects on 16-byte boundaries, 1/16th for 1TB worth of large objects on 4KB boundaries, 1/16th for 16KB worth of huge objects on 64KB boundaries, and have 10/16ths left for whatever). The limiting factor for many programs' performance is caching, so I would think shrinking object refs from 64 bits to 32 could be a pretty big win. Even simply having logical address be `segreg<<4 + offs` as on the original 8086 would increase sixteenfold the amount of memory accessible using 32-bit references. – supercat Apr 28 '15 at 17:09
2

The answer to this that I see, as to what degree a NULL value may be dereferenced, is it is deliberately left platform-dependent in an unspecified manner, due to what is left implementation-defined in C11 6.3.2.3p5 and p6. This is mostly to support freestanding implementations used for developing boot code for a platform, as OP indicates in his rebuttal link, but has applications for a hosted implementation too.

Re:
(C11, 6.5.3.2p4) "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.102)"

102): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."

This is phrased as it is, afaict, because each of the cases in the footnote may NOT be invalid for specific platforms a compiler is targeting. If there's a defect there, it's "invalid value" should be italicized and qualified by "implementation-defined". For the alignment case a platform may be able to access any type using any address so has no alignment requirements, especially if address rollover is supported; and a platform may assume an object's lifetime only ends after the application has exited, allocating a new frame via malloc() for automatic variables on each function call.

For null pointers, at boot time a platform may have expectations that structures the processor uses have specific physical addresses, including at address 0, and get represented as object pointers in source code, or may require the function defining the boot process to use a base address of 0. If the standard didn't permit dereferences like '&podhd->line6', where a platform required podhd to have a base address of 0, then assembly language would be needed to access that structure. Similarly, a soft reboot function might need to dereference a 0 valued pointer as a void function invocation. A hosted implementation may consider 0 the base of an executable image, and map a NULL pointer in source code to the header of that image, after loading, as the struct required to be at logical address 0 for that instance of the C virtual machine.

What the standard calls pointers are more handles into the virtual address space of the virtual machine, where object handles have more requirements on what operations are permitted for them. How the compiler emits code that takes the requirements of these handles into account for a specific processor is left undefined. What is efficient for one processor may not be for another, after all.

The requirement on (void *)0 is more that the compiler emit code that guarantees expressions where the source uses (void *)0, explicitly or by referencing NULL, that the actual value stored will be one that says this can't point to any valid function definitions or objects by any mapping code. This does not have to be a 0! Similarly, for casts of (void *)0 to (obj_type) and (func_type), these are only required to get assigned values that evaluate as addresses the compiler guarantees are not being used then for objects or code. The difference with the latter is these are unused, not invalid, so are capable of being dereferenced in the defined manner.

The code that tests for pointer equality would then check if one operand is one of these values that the other is one of the 3, not just the same bit pattern, because this scoreboards them with the RTTI of being a (null *) type, distinct from void, obj, and func pointer types to defined entities. The standard could be more explicit it is a distinct type, if unnamed because compilers only use it internally, but I suppose this is considered obvious by "null pointer" being italicized. Effectively, imo, a '0' in these contexts is an additional keyword token of the compiler, due to the additional requirement of it identifying the (null *) type, but isn't characterized as such because this would complicate the definition of < identifiers >.

This stored value can be SIZE_MAX as easily as a 0, for a (void *)0, in emitted application code when implementations, for example, define the range 0 to SIZE_MAX-4*sizeof(void *) of virtual machine handles as what is valid for code and data. The NULL macro may even be defined as
(void *)SIZE_MAX, and it would be up to the compiler to figure out from context this has the same semantics as 0. The casting code is responsible for noting it is the chosen value, in pointer <--> pointer casts, and supply what is appropriate as an object or function pointer. Casts from pointer <--> integer, implicit or explicit, have similar check and supply requirements; especially in unions where a (u)intptr_t field overlays a (type *) field. Portable code can guard against compilers not doing this properly with an explicit *(ptr==NULL?(type *)0:ptr) expression.

M. Ziegast
  • 165
  • 4