34

In C, deferencing the null pointer is Undefined Behavior, however the null pointer value has a bit representation that in some architectures make it points to a valid address (e.g the address 0).
Let's call this address the null pointer address, for the sake of clarity.

Suppose I want to write a piece of software in C, in an environment with unrestrained access to memory. Suppose further I want to write some data at the null pointer address: how would I achieve that in a standard compliant way?

Example case (IA32e):

#include <stdint.h>

int main()
{
   uintptr_t zero = 0;

   char* p = (char*)zero;

   return *p;
}

This code when compiled with gcc with -O3 for IA32e gets transformed into

movzx eax, BYTE PTR [0]
ud2

due to UB (0 is the bit representation of the null pointer).

Since C is close to low level programming, I believe there must be a way to access the null pointer address and avoid UB.


Just to be clear
I'm asking about what the standard has to say about this, NOT how to achieve this in a implementation defined way.
I know the answer for the latter.

edmz
  • 8,220
  • 2
  • 26
  • 45
Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • 5
    null pointer and the address 0x0 are not the same. – 2501 Feb 21 '16 at 14:51
  • 1
    I think you should try this with a compiler for the intended environment. – Martin Zabel Feb 21 '16 at 14:51
  • 2
    There is no standard-compliant way to do this since the standard provides no way to access arbitrary memory. You will have to do something implementation-dependent. Check your compiler documentation to see what your implementation allows. – Raymond Chen Feb 21 '16 at 15:00
  • @2501 I know, I guess. I just not wanted this question to be too abstract. Can I ask you to elaborate with some terminology just to be sure I didn't get things wrong? – Margaret Bloom Feb 21 '16 at 15:01
  • Your linker for such an environment should allow you to define a section starting at 0. – Martin James Feb 21 '16 at 15:03
  • Don't change the question, please mention what you edited, it makes the answers look weird. – Van Tr Feb 21 '16 at 15:33
  • @2501: To be more precise, citing the standard: "An integer constant expression with the value 0, or ...". So, `0` is only a _null pointer constant_; this is different from a _null pointer_. – too honest for this site Feb 21 '16 at 15:39
  • 1
    "I'm asking about what the standard has to say about this, NOT how to achieve this in a implementation defined way." - This results in no answer at all. Because the conversion of the the `0` assigned to the integer type `zero` to the pointer is already undefined behaviour. The standard only allows conversion of a pointer to this type and back. Even using a different pointer type is UB already. – too honest for this site Feb 21 '16 at 15:42
  • @Olaf I'd think a *null pointer constant* would have to be a *null pointer*. – Andrew Henle Feb 21 '16 at 15:46
  • @2501: No! It is left to the implementation how to test this. The standard just requires a _null pointer_ to yield `0` if used in a condition. Strange you now write the exact opposite of what you wrote some comments ago: "The value of null pointer is always 0 but it's bit representation is **not**." – too honest for this site Feb 21 '16 at 15:48
  • @2501 *The value of null pointer is also 0, otherwise it could not be used in an if statement where it is implicitly compared to 0.* No. The standard clearly says merely that *Any two null pointers shall compare equal.* and *An integer constant expression with the value 0 ... is a null pointer constant*. It leaves the actual values of any null pointer implementation-defined, with multiple values possible. – Andrew Henle Feb 21 '16 at 15:49
  • @AndrewHenle How does an if statement work then, `if( pointer )`, where pointer is a null pointer? – 2501 Feb 21 '16 at 15:53
  • @AndrewHenle: No. `0` is a null pointer constant only is pointer context. But a _null pointer_ is a pointer variable which equals a _null pointer constant_. (I really hate it C11 here did not follow C++11 and provide a specific keyword like _Nullptr - with a header+macro `nullptr`). Other languages like Pascal were more inteligent from the start. – too honest for this site Feb 21 '16 at 15:53
  • @2501: The implementation e.g could use a bit-test (assuming a null pointer just has a bit set which is otherwise cleared). How is something like `_Bool b = 5;` converted? – too honest for this site Feb 21 '16 at 15:55
  • @Olaf Ok, I see what you are trying to say. `int i = null_pointer; i== 0` comparison might yield anything, but comparison `null_pointer == 0` will always yield true. – 2501 Feb 21 '16 at 15:57
  • @2501: `int i = null_pointer` is implementation defined, assuming `null_pointer` is a pointer type. If you mean `0`, that is only a _null pointer constant_ in pointer context, otherwise it is an _integer constant_ (other languages call it an _integer literal_). I spare us another rant about having this unnecessary ambivalence. Note that C++11 introduced `nullptr` exactly to get rid of this hack (which are worse in C++, as you have to cast `void *` to a pointer, thus cannot have `#define NULL ((void *)0)` like in C. FYI: gcc uses a built-in name for longer time in that macro already. – too honest for this site Feb 21 '16 at 16:04
  • @Olaf *No. 0 is a null pointer constant only is pointer context.* OK. I was assuming an implied pointer context. – Andrew Henle Feb 21 '16 at 16:06
  • @AndrewHenle Well so was I. – 2501 Feb 21 '16 at 16:06
  • @Olaf I retract my statement my first comment about pointer values. It doesn't make sense to say it does have a value as it can only be compared with other pointers and 0. I think I won't be using value with pointers anymore as it is meaningless. – 2501 Feb 21 '16 at 16:07
  • 2
    @2501: Well, it depends. In a plain standard-compliant context, the value of a pointer is quite meaningless. Basically, a pointer can be a _null pointer_, or point into an "array" (which includes single objects which are arrays of length 1 for this). Either way, the actual bit-representation is implementation-specific. And comparing two pointers is only allowed for _null pointers_ or if they point into the same "array" - or exactly past the last element. But for e.g. embedded systems you have to "bend the rules" and rely on a specific, i.e. implementation-defined behaviour. – too honest for this site Feb 21 '16 at 16:16
  • @IlDivinCodino There are two answer, and they still make sense (at least one..). I edited for a better clarity, caring not to change the meaning. – Margaret Bloom Feb 21 '16 at 16:40
  • 1
    The question is ambiguous with the edit: do you want to know how to achieve it in a standard compliant way or what the standard has to say about that? – edmz Feb 21 '16 at 22:00
  • @Black What's the difference? An answer of the kind "You can use this code" or "No, you can't do it" would not be satisfactory without a reference to the appropriate lines from the standard. – Margaret Bloom Feb 22 '16 at 07:54
  • The only compliant part here is ` * p= 0;`, every implementation is required to stuff the `null` pointer in `p` regardless of implementation or bit patterns. Everything else will fall under UB. – H H Nov 29 '16 at 09:37
  • Do note that the standard creates just enough room between specification and implementation so that a compiler could reverse the direction of memory: `p++` could lower `p`. As long as all operators, including comparison, are in on it. – H H Nov 29 '16 at 09:39
  • I believe the answer is always `volatile`. Some moderators don't like the answers based on that and will remove these. – curiousguy May 24 '19 at 22:54

5 Answers5

23

I read (part of) the C99 standard to clear my mind. I found the sections that are of interest for my own question and I'm writing this as a reference.

DISCLAIMER
I'm an absolute beginner, 90% or more of what I have written is wrong, makes no sense, or may break you toaster. I also try to make a rationale out of the standard, often with disastrous and naive results (as stated in the comment).
Don't read.
Consult @Olaf, for a formal and professional answer.

For the following, the term architectural address designed a memory address as seen by the processor (logical, virtual, linear, physical or bus address). In other word the addresses that you would use in assembly.


In section 6.3.2.3. it reads

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

and regarding integer to pointer conversion

An integer may be converted to any pointer type. Except as previously specified [i.e. for the case of null pointer constant], the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

These imply that the compiler, to be compliant, need only to implement a function int2ptr from integer to pointers that

  1. int2ptr(0) is, by definition, the null pointer.
    Note that int2ptr(0) is not mandated to be 0. It can be any bit representation.
  2. *int2ptr(n != 0) has no constraints.
    Note that this means that int2ptr needs not to be the identity function, nor a function that return valid pointers!

Given the code below

char* p = (char*)241;

The standard makes absolute no guarantee that the expression *p = 56; will write to the architectural address 241.
And so it gives no direct way to access any other architectural address (including int2ptr(0), the address designed by a null pointer, if valid).

Simply put the standard does not deal with architectural addresses, but with pointers, their comparison, conversions and their operations.

When we write code like char* p = (char*)K we are not telling the compiler to make p point to the architectural address K, we are telling it to make a pointer out of the integer K, or in other term to make p point to the (C abstract) address K.

Null pointer and the (architectural) address 0x0 are not the same (cit.) and so is true for any other pointer made from the integer K and the (architectural) address K.

For some reasons, childhood heritages, I thought that integer literals in C could be used to express architectural addresses, instead I was wrong and that only happen to be (sort of) correct in the compilers I was using.

The answer to my own question is simply: There is no standard way because there are no (architectural) address in the C standard document. This is true for every (architectural) address, not only the int2ptr(0) one1.


Note about return *(volatile char*)0;

The standard says that

If an invalid value [a null pointer value is an invalid value] has been assigned to the pointer, the behavior of the unary * operator is undefined.

and that

Therefore any expression referring to such an [volatile] object shall be evaluated strictly according to the rules of the abstract machine.

The abstract machine says that * is undefined for null pointer values, so that code shouldn't differ from this one

return *(char*)0;

which is also undefined.
Indeed they don't differ, at least with GCC 4.9, both compile to the instructions stated in my question.

The implementation defined way to access the 0 architectural address is, for GCC, the use of the -fno-isolate-erroneous-paths-dereference flag which produces the "expected" assembly code.


The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.

Unfortunately it says that the & yields the address of its operand, I believe this is a bit improper, I would say that it yields a pointer to its operand. Consider a variable a that is known to resides at address 0xf1 in a 16 bit address space and consider a compiler that implements int2ptr(n) = 0x8000 | n. &a would yield a pointer whose bit representation is 0x80f1 which is not the address of a.

1Which was special to me because it was the only one, in my implementations, that couldn't be accessed.

Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • I think you got the basic idea here. Essentially, you shouldn't think of pointers as "addresses in memory", and that avoids most of the misconceptions. – Kerrek SB Feb 21 '16 at 20:57
  • 1
    This seems to work: `volatile uintptr_t addr = 0; return *(volatile char *)(addr);`. But it causes an extra memory operation to be emitted. It may be best to write accesses to address 0 directly in machine code. – Kerrek SB Feb 24 '16 at 09:37
  • Addresses aren't just numbers. See my many questions (mostly badly received) about pointers in C and C++ like [Are pointer variables just integers with some operators or are they “symbolic”?](https://stackoverflow.com/q/32045888/963864) – curiousguy May 25 '19 at 00:10
15

As OP has correctly concluded in her answer to her own question:

There is no standard way because there are no (architectural) address in the C standard document. This is true for every (architectural) address, not only the int2ptr(0) one.

However, a situation where one would want to access memory directly is likely one where a custom linker script is employed. (I.e. some kind of embedded systems stuff.) So I would say, the standard compliant way of doing what OP asks would be to export a symbol for the (architectural) address in the linker script, and not bother with the exact address in the C code itself.

A variation of that scheme would be to define a symbol at address zero and simply use that to derive any other required address. To do that add something like the following to the SECTIONS portion of the linker script (assuming GNU ld syntax):

_memory = 0;

And then in your C code:

extern char _memory[];

Now it is possible to e.g. create a pointer to the zero address using for example char *p = &_memory[0]; (or simply char *p = _memory;), without ever converting an int to a pointer. Similarly, int addr = ...; char *p_addr = &_memory[addr]; will create a pointer to the address addr without technically casting an int to a pointer.

(This of course avoids the original question, because the linker is independent from the C standard and C compiler, and every linker might have a different syntax for their linker script. Also, the generated code might be less efficient, because the compiler is not aware of the address being accessed. But I think this still adds an interesting perspective to the question, so please forgive the slightly off-topic answer..)

Community
  • 1
  • 1
CliffordVienna
  • 7,995
  • 1
  • 37
  • 57
  • Note that it may be necessary to disable certain optimizations when using such constructs, and some compilers that can't disable such optimizations may not be able to support such constructs reliably at all. For example, given `char *p = _memory; ... if (p) ...` or even `if ((uintptr_t)p)` a compiler might decide that `p`'s address can't possibly match that of a null pointer (since it was assigned the address of `_memory`) and omit the comparison, causing unknowable amounts of mayhem. – supercat Oct 17 '18 at 15:47
  • 1
    This. Not only this is a correct answer to the question, it is the only correct way to deal with data that should be placed to specific fixed platform-dependent memory addresses. Using hardcoded pointers is common, but wrong. – Igor Zhirkov Mar 26 '19 at 23:24
3

Whatever solution is going to be implementation-dependent. Needfully. ISO C does not describe the environment a C programs runs on; rather, what a conforming C program looks like among a variety of environments («data-processing systems»). The Standard cannot indeed guarantee what you would get by accessing an address that is not an array of objects, i.e. something you visibly allocated, not the environment.

Therefore, I would use something the standard leaves as implementation-defined (and even as conditionally-supported) rather than undefined behavior*: Inline assembly. For GCC/clang:

asm volatile("movzx 0, %%eax;") // *(int*)0;

It also worth mentioning freestanding environments, the one you seem to be in. The standard says about this execution model (emphasis mine):

§ 5.1.2

Two execution environments are defined: freestanding and hosted. [...]

§ 5.1.2.1, comma 1

In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined. [...]

Notice it doesn't say you can access any address at will.


Whatever that could mean. Things are a bit different when you are the implementation the standard delegates control to.

All quotes are from the draft N. 1570.

edmz
  • 8,220
  • 2
  • 26
  • 45
  • 1
    The Standard does not require that any implementation be suitable for any particular purpose. Indeed, the authors recognize (in the Rationale) that an implementation could be simultaneously conforming and useless. While freestanding implementations aren't required to define *any* means via which a program could behave in a manner distinguishable from `int main(void) { volatile int dummy; while(!dummy) {} }` quality freestanding implementations will define useful behaviors even in cases where the Standard would not require it. – supercat Aug 13 '18 at 14:55
3

The C Standard does not require that implementations have addresses that resemble integers in any way shape or form; all it requires is that if types uintptr_t and intptr_t exist, the act of converting a pointer to uintptr_t or intptr_t will yield a number, and converting that number directly back to the same type as the original pointer will yield a pointer equal to the original.

While it is recommended that platforms which use addresses that resemble integers should define conversions between integers and addresses in a fashion that would be unsurprising to someone familiar with such mapping, that is not a requirement, and code relying upon such a recommendation would not be strictly conforming.

Nonetheless, I would suggest that if a quality implementation specifies that it performs integer-to-pointer conversion by a simple bitwise mapping, and if there may be plausible reasons why code would want to access address zero, a it should regard statements like:

*((uint32_t volatile*)0) = 0x12345678;
*((uint32_t volatile*)x) = 0x12345678;

as a request to write to address zero and address x, in that order even if x happens to be zero, and even if the implementation would normally trap on null pointer accesses. Such behavior isn't "standard", insofar as the Standard says nothing about the mapping between pointers and integers, but a good quality implementation should nonetheless behave sensibly.

supercat
  • 77,689
  • 9
  • 166
  • 211
1

I'm assuming the question you are asking is:

How do I access memory such that a pointer to that memory has the same representation as the null pointer?

According to a literal reading of the Standard, this is not possible. 6.3.2.3/3 says that any pointer to an object must compare unequal to the null pointer.

Therefore this pointer we are talking about must not point to an object. But the deference operator *, applied to an object pointer, only specifies the behaviour in the case that it points to an object.


Having said that, the object model in C has never been specified rigorously, so I would not put too much weight into the above interpretation. Nevertheless, it seems to me that whatever solution you come up with is going to have to rely on non-standard behaviour from whichever compiler is in use.

We see an example of this in the other answers in which gcc's optimizer detects an all-bits-zero pointer at a late stage of processing and flags it as UB.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Even if I asked to access address 100, that could not be done in a C standard way. Though I found this problem due to being unable to access the null pointer address, this is not an issue with pointers with value 0, this is an issue with pointers of any value. The integer constants simply doesn't specify machine addresses (the map is implementation defined) and that's what I was missing. As for the implementation specific way, with GCC integer constants actually do specify addresses and *-fno-isolate-erroneous-paths-dereference* prevent the generation of the `ud2` trap. – Margaret Bloom Feb 21 '16 at 21:36
  • 1
    The implementation may define a conversion `(char *)100` . I think that is a separate issue – M.M Feb 21 '16 at 21:42
  • A `NULL` pointer is "guaranteed to compare unequal to a pointer to any object or function" which implies that the compiler can never generate an object whose address is the location of the null pointer, it does not imply that there can not actually be an object at that location (only that you cannot take a (valid) pointer to that object). Accessing a valid object at the `NULL` address is *implementation-defined* but not *undefined* behaviour. `NULL` may point to a valid *object*, it simply wouldn't be a valid *pointer* (treated the same as if it was misaligned, i.e. implementation defined). – yyny Jul 31 '20 at 14:33
  • @yyny null pointers don't point to a location (in the abstract machine, which is how C is defined) – M.M Aug 01 '20 at 00:10