0

Is this behavior defined or not?

volatile long (*volatile ptr)[1] = (void*)NULL;
volatile long v = (long) *ptr;

printf("%ld\n", v);

It works because by dereferencing pointer to array we are receiving an array itself, then that array decaying to pointer to it's first element.

Updated demo: https://ideone.com/DqFF6T

Also, GCC even considers next code as a constant expression:

volatile long (*ptr2)[1] = (void*)NULL;
enum { this_is_constant_in_gcc = ((void*)ptr2 == (void*)*ptr2) };
printf("%d\n", this_is_constant_in_gcc);

Basically, dereferencing ptr2 at compile time;

Neko Neko
  • 39
  • 3
  • 7
    Your question is very unclear... dereferencing NULL is invalid and will basically always cause a segmentation fault. – Marco Bonelli Jan 17 '20 at 15:44
  • @MarcoBonelli I am not so sure about that. The real question here is do really dereference happening at all? – Neko Neko Jan 17 '20 at 15:58
  • 2
    Yes, there is really a dereference happening. `ptr` is a pointer. – KamilCuk Jan 17 '20 at 15:59
  • @KamilCuk. Then why that code is working at all? – Neko Neko Jan 17 '20 at 16:02
  • 2
    Because "working" is a great and perfectly valid example of undefined behavior. Undefined behavior does _not_ mean "it will not work". It means it may work. It may fail. It is _not defined_ what will happen. – KamilCuk Jan 17 '20 at 16:02
  • @NekoNeko may I suggest a better title? Try with: "why is dereferencing a null pointer to array not crashing my program?" – Marco Bonelli Jan 17 '20 at 16:04
  • @NekoNeko I answered that question for you. – Marco Bonelli Jan 17 '20 at 16:06
  • @NekoNeko Why is the array so important here? It has only one element and could be also declared else as usual object initialized with the value of `NULL`. – RobertS supports Monica Cellio Jan 17 '20 at 16:08
  • @RobertS-ReinstateMonica I guess that it's only important merely due to the fact that makes GCC emit code that does not really dereference any NULL pointer... which is an interesting if you ask me, even though it's only a result of UB. – Marco Bonelli Jan 17 '20 at 16:09
  • 3
    The argument at the end for why the behavior should be defined actually makes this a somewhat interesting question. – John Bollinger Jan 17 '20 at 16:11
  • There was a long discussion related to this here: https://stackoverflow.com/questions/58858493/legal-or-out-of-bounds-forming-the-address-of-the-first-element-past-the-end/58859364 – dbush Jan 17 '20 at 16:12
  • @RobertS-ReinstateMonica because pointer to array and pointer to something else is a different things. Dereferencing pointer will read data at location where pointer points. When dereferencing pointer to array we receiving an array itself. But since we can't receive an array, it is decayed to pointer to first element. – Neko Neko Jan 17 '20 at 16:30
  • 2
    *because pointer to array and pointer to something else is a different things.* - Technically, they aren´t. A pointer points to an address of an object of a certain type, never mind whether this object is part of an array or not. It may has some implementation-defined reasons for choosing one for the other, but per standard there is no difference. *When dereferencing pointer to array we receiving an array itself.* No, when you dereferencing a pointer to an array, in fact you accessing the first element object of that array, according to what I´ve said above. – RobertS supports Monica Cellio Jan 17 '20 at 16:53
  • 1
    No, @RobertS-ReinstateMonica, NekoNeko is right that when you dereference a pointer to an array, the result is the array itself -- subject to the provision that the pointer is in fact valid in the first place. You two seem to be swapping places back and forth, because this is indeed exactly because a pointer to an array and a pointer to a non-array are *not* fundamentally different things. However, just as it does almost anywhere else that an array-valued expression appears, that array value is then automatically *converted* to a pointer to the first element. – John Bollinger Jan 17 '20 at 17:21
  • Perhaps the crux of your confusion, @NekoNeko, is that the dereference and the subsequent conversion are separate actions, performed sequentially, at least in the abstract machine model with which C is defined. The behavior of the dereference has to be defined on its own, independent of the anticipated conversion of its result, for the behavior of the overall expression to be defined. – John Bollinger Jan 17 '20 at 17:27
  • @JohnBollinger, Well, yes you a right, it is not pointers are different, but array is(because of it's decay). If we look at abstract C, and separate 'dereference' from 'array decay', then 'dereference' is really undefined behavior, But because of array decay, it is not clear if dereference is happening at all, because we are not reading real data here. GCC even considers such dereference + array decay as a compile time constant. – Neko Neko Jan 17 '20 at 20:57
  • 1
    No, @NekoNeko, dereferencing an array pointer is well-defined behavior. It is completely orthogonal to the automatic conversion of the result that happens in *most* circumstances, and this is exactly the point. The dereference absolutely happens (in the abstract machine model) *because the standard says it does*. There is nothing unclear about that. – John Bollinger Jan 17 '20 at 21:29

4 Answers4

6

This:

long (*ptr)[1] = NULL;

Is declaring a pointer to an "array of 1 long" (more precisely, the type is long int (*)[1]), with the initial value of NULL. Everything fine, any pointer can be NULL.

Then, this:

long v = (long) *ptr;

Is dereferencing the NULL pointer, which is undefined behavior. All bets are off, if your program does not crash, the following statement could print any value or do anything else really.

Let me make this clear one more time: undefined behavior means that anything can happen. There is no explanation as to why anything strange happens after invoking undefined behavior, nor there needs to be. The compiler could very well emit 16-bit Real Mode x86 assembly, produce a binary that deletes your entire home folder, emit the Apollo 11 Guidance Computer assembly code, or whatever else. It is not a bug. It's perfectly conforming to the standard.


The only reason your code seems to work is because GCC decides, purely out of coincidence, to do the following (Godbolt link):

mov     QWORD PTR [rbp-8], 0    ; put NULL on the stack
mov     rax, QWORD PTR [rbp-8]
mov     QWORD PTR [rbp-16], rax ; move NULL to the variable v

Causing the NULL-dereference to never actually happen. This is most probably a consequence of the undefined behavior in dereferencing ptr ¯\_(ツ)_/¯


Funnily enough, I previously said in a comment:

dereferencing NULL is invalid and will basically always cause a segmentation fault.

But of course, since it is undefined behavior that "basically always" is wrong. I think this is the first time I ever see a null-pointer dereference not cause a SIGSEGV.

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
  • `long (*ptr)[1] = NULL;` - *Is declaring a pointer to an "array of 1 `long`", with the initial value of `NULL`. Everything fine, any pointer can be `NULL`.* - Does `ptr` have the value of `NULL` or is `NULL` stored in the `long` object and `ptr` just points to the `NULL` value by the statement of `long (*ptr)[1] = NULL;` ? This is what is not so clear to me, now. – RobertS supports Monica Cellio Jan 17 '20 at 16:23
  • @RobertS-ReinstateMonica `ptr` is a variable of type `long (*)[1]`. Yes, that is a very confusing definition. – Marco Bonelli Jan 17 '20 at 16:30
  • @RobertS-ReinstateMonica long (*ptr)[1] = NULL; is a way to tell that there is an array at address 0. – Neko Neko Jan 17 '20 at 16:36
  • No, @NekoNeko. `long (*ptr)[1] = NULL;` is a way to say that `ptr` does not point to any object. That is in fact one of the primary purposes of null pointer constants such as `NULL`. – John Bollinger Jan 17 '20 at 17:13
  • @MarcoBonelli, `mov rax,QWORD PTR [rax+0x8] `, that is strange, because to really read value at address 0 we should write (*ptr)[0] or **ptr (i.e. dereference ptr twice). I'll check this out, but it looks like bug to me. – Neko Neko Jan 17 '20 at 18:00
  • 1
    @NekoNeko I really don't I know how to say it anymore: **dereferencing `NULL` is undefined behavior! Anything can happen. There is *no explanation* as to why that happens.** The compiler could very well emit 16-bit Real Mode x86 assembly, produce a binary that deletes your entire home folder, emit the Apollo 11 Guidance Computer source code, or whatever else. It is **NOT** a bug. It's perfectly conforming to the standard. – Marco Bonelli Jan 17 '20 at 18:20
  • @MarcoBonelli No, there is an explanation why this happens: array decayed to pointer. And code generated is clearly incorrect, because it is really reading data at address 0, when it should not. If, for example, ptr points to real array of volatile long(like a memory mapped register), then that register will be read. – Neko Neko Jan 17 '20 at 20:47
  • @NekoNeko of course, array pointer decay is a thing, but even if the array decayed to pointer the program would definitely *not* access at `+0x8`. Really, you're trying to find an explanation for something that has none. – Marco Bonelli Jan 17 '20 at 20:51
  • @MarcoBonelli well, i'm just curious :) Could you please specify exact ubuntu version, where you got such result, compile options for gcc and exact code you compiled? – Neko Neko Jan 17 '20 at 21:00
  • @NekoNeko Code: the one in the [first version of your question](https://stackoverflow.com/revisions/59790488/1). Ubuntu 18, you have the exact deb package linked above, just install it with `apt install ./xxx.deb` on an Ubuntu 18 VM and you're ready to go. No compiler options. – Marco Bonelli Jan 17 '20 at 21:02
  • @MarcoBonelli ubuntu 18.04 or 18.10 ? – Neko Neko Jan 17 '20 at 22:22
  • @NekoNeko .04 sorry forgot that – Marco Bonelli Jan 17 '20 at 22:23
5

Is this behavior defined or not?

Not.

long (*ptr)[1] = NULL;
long v = (long) *ptr;

printf("%ld\n", v);

It works because by dereferencing pointer to array we are receiving an array itself, then that array decaying to pointer to it's first element.

No, you are confusing type with value. It is true that the expression *ptr on the second line has type long[1], but evaluating that expression produces undefined behavior regardless of the data type, and regardless of the automatic conversion that would be applied to the result if it were defined.

The relevant section of the spec is paragraph 6.5.2.3/4:

The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ''pointer to type'', the result has type ''type''. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

A footnote goes on to clarify that

[...] Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer [...]

It may "work" for you in an empirical sense, but from a language perspective, any output at all or none is a conforming result.

Update:

It may be interesting to note that the answer would be different for explicitly taking the address of *ptr than it is for supposing that array decay will overcome the undefinedness of the dereference. The standard provides that, as a special case, where the operand of the unary & operator is the result of a unary * operator, neither of those operators is evaluated. Provided that all relevant constraints are satisfied, the result is as if they were both omitted altogether, except that it is never an lvalue.

Thus, this is ok:

long (*ptr)[1] = NULL;
long v = (long) &*ptr;

printf("%ld\n", v);

On many implementations it will reliably print 0, but do note that C does not specify that it must be 0.

The key distinction here is that in this case, the * operation is not evaluated (per spec). The * operation in the original code is is evaluated, notwithstanding the fact that if the pointer value were valid, the resulting array would be converted right back to a pointer (of a different type, but to the same location). That does suggest an obvious shortcut that implementations may take with the original code, and they may take it, if they wish, without regard to whether ptr's value is valid because if it is invalid then they can do whatever they want.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • **if it points to an object, the result is an lvalue designating the object** object here is an array and it is successfully returned. – Neko Neko Jan 17 '20 at 16:34
  • 2
    No, @NekoNeko. Setting aside for a moment the fact that the standard explicitly says that a null pointer falls into the "invalid value" category, you are still confusing type and value. The pointer would point to an object if it were valid, but not all pointer values are valid, and null pointers in particular do not point to any object *by definition*. – John Bollinger Jan 17 '20 at 16:38
  • It is curious that the program seems to print 0; I thought that Linux systems were all-but-guaranteed to segfault on null pointer dereference, but this program also prints 0 on my Linux laptop. – ad absurdum Jan 17 '20 at 16:40
  • @exnihilo it will print 0 anywhere, i think. Because there is no 'real' dereference happenning. – Neko Neko Jan 17 '20 at 16:43
  • Welcome to undefined behavior, @exnihilo. Phenomenologically, I suppose we can conclude that whatever your compiler does with that code does not produce a *bona fide* attempt to dereference a null pointer. Marco's answer speaks to that a bit. – John Bollinger Jan 17 '20 at 16:43
  • 2
    @NekoNeko no, it will not print 0 everywhere. It's undefined behavior. *ANY* compliant compiler could very well print `Hello World!` after dereferencing `NULL`. – Marco Bonelli Jan 17 '20 at 16:48
  • @JohnBollinger I understand that it is UB, my point was that the observed behavior is both unusual, and suggestive of the misinterpretation that your answer disabuses OP of. – ad absurdum Jan 17 '20 at 16:48
1

To just answer you´re provided questions:

  1. Is dereferencing a NULL pointer to array valid in C?

No.

  1. Is this behavior defined or not?

It is classified as "undefined behavior", so it is not defined.


Never mind of the case, that this trick with the array, maybe will work on some implementations and it fills absolutely no needs to do so (I imply you are asking out of curiousity), it is not valid per the C standard to dereference a NULL pointer in any way and will cause "Undefined Behavior".


Anything can happen when you implement such statements into your program.

Look at the answers on this question, which explain why:

What EXACTLY is meant by "de-referencing a NULL pointer"?

One qoute from Adam Rosenfield´s answer:

A null pointer is a pointer that does not point to any valid data (but it is not the only such pointer). The C standard says that it is undefined behavior to dereference a null pointer. This means that absolutely anything could happen: the program could crash, it could continue working silently, or it could erase your hard drive (although that's rather unlikely).

0

Is this behavior defined or not?

The behavior is undefined because you are applying * operator to a pointer that compares equal to null pointer constant.

The following stackoverflow thread tries to explain what undefined behavior is: Undefined, unspecified and implementation-defined behavior

KamilCuk
  • 120,984
  • 8
  • 59
  • 111