2

How can I create a reserved pointer value?

The context is this: I have been thinking of how to implement a data structure for a dynamic scripting language (I am not planning on implementing this - just wondering how it would be done).

Strings may contain arbitrary bytes, including NUL. Thus, it is necessary to store the value separately. This requires a pointer (to point to the array) and a number. The first trick is that if the pointer is NULL, it cannot possibly be a valid string, so the number can be used for an actual integer.

If a second reserved pointer value could be created, this could be used to imply that the other field is now being used as a floating-point value. Can this be done?

One thought is to mmap() an address with no permissions, which could also be done to replace the usage of the NULL pointer.

Demi
  • 3,535
  • 5
  • 29
  • 45
  • Why don't you use a tagged union instead? `struct { int type; union { char *string_value; double float_value; } }` –  Dec 21 '13 at 11:18
  • Then length of the string (as a string may contain `'\0'`) would require one more integer field. – nullptr Dec 21 '13 at 11:30
  • Do you want the application to crash upon accessing the reserved node? The accepted answer seems to imply that. Instead of using some address that is known to be invalid, you can use an address that is known to be valid. Just create a global variable `static const string invalid_string;`. You can compare any string's address against this object's address, and you can dereference it _without_ crashing. Note that crashing is _sometimes_ good and desired behavior, but it depends on what you want. If you consider accessing the reserved node a program error, it should really crash. – Damon Dec 21 '13 at 16:53
  • Yes, it would be a logic error to access the pointer if it had the magic value, since this means the struct is storing a number. – Demi Dec 21 '13 at 17:54

3 Answers3

7

On any modern system, you can just use the pointer values 1, 2, ... 4095 for such purposes. Another frequent choice is (uintptr_t)-1, which is technically inferior, but used more frequently than 1 nevertheless.

Why are these values "safe"?
Modern systems safeguard against NULL pointer accesses by making it impossible to map anything at virtual address zero. Almost any dereferencing of a NULL pointer will hit this nonexistant region, and the hardware will tell the OS system that something bad happened, which triggers the OS to segfault the process.
Since virtual memory pages are page aligned (at least 4k on current hardware), and nothing is mapped to address zero, nothing can be mapped to the entire range 0, ..., 4095, protecting all these addresses in the same way, and you can use them as special purpose values.

How much virtual memory space is reserved for this purpose is a system parameter, on linux it is controlled by /proc/sys/vm/mmap_min_addr, and the root user can change it to zero, which would disable this protection (which would not be a very smart idea). The default on Ubuntu is 64k (i. e. 16 pages).

This is also the reason why (uintptr_1)-1 is less safe than 1; even though any load of more than one byte will hit the zero page, the address (uintptr_1)-1 itself is not necessarily protected in this way. Consequently, doing string operations on (char*)-1 does not necessarily segfault.

Edit:
My original explanation with the special mapping seems to have been a bit stale, probably this was the way things were handled on the old Mac/PPC platform. Even though the effect is pretty much the same, I changed the details of the answer to reflect modern linux. Anyway, the important point is not how the null page protection is achieved, the important point is that any sane, modern system will have some null page protection that encompasses at least the mentioned address range. Some more details can be found in this SO answer: https://stackoverflow.com/a/12645890/2445184

Community
  • 1
  • 1
cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • 2
    Can you cite some source for your answer, I wanted to know/read about it in more detail. – 0xF1 Dec 21 '13 at 11:40
  • Most current hardware cannot address more than 48 bits. On any current Intel processor, accessing -1 will always segfault. – SoapBox Dec 21 '13 at 16:29
  • @0xF1 It was information out of the back of my head, which I confess had become a bit stale, but I updated my answer with the newer information. Sorry if I confused you. – cmaster - reinstate monica Dec 21 '13 at 16:29
  • @SoapBox No, that the hardware cannot address more than 48 bits does not mean that `(uint64_t)-1` is outside the addressable range. As a matter of fact, the x86-64 architecture specifies that the first 16 bits must be the sign extension of the remaining 48 bits, which is the case with `(uint64_t)-1`. An illegal address would be `0x0000ffffffffffffull`. – cmaster - reinstate monica Dec 21 '13 at 16:33
5

In standard C (and standard C++), the approach that's 100% valid and works is simple: declare a variable, use its address as a magic value.

char *ptr;
char magic;
if (ptr == &magic) { ... }

This guarantees that magic will never have any overlap with another object.

Magic pointer values such as (char *) 1 have their advantages too, but it's so easy to get them wrong (even if you disregard the theoretical implementations where (char *) 1 may be a valid object, if you use (int *) 1 as a magic pointer value, and the optimiser assumes int * values are suitably aligned, it may removes checks that are no-ops only in 100% valid code, not in your code) that I'd recommend the standard approach, and optionally temporarily switch to magic pointer values only if you find they help you debug.

  • The one problem with this is that trying to access the variable does not throw an exception or raise a signal, which makes debugging harder. – Demi Dec 21 '13 at 12:00
  • @Demetri Yes, that is why I included the last sentence. :) But in a large number of cases, tools such as valgrind will already detect invalid accesses (not necessarily an access to `magic` directly, that would require special annotations, but read a single byte beyond it, and you'll get a useful message). –  Dec 21 '13 at 12:03
1

mmaping an address can fail if the address is already assigned. Probably it would better to use an address of some static variable or function. Or to obtain an unique address via malloc(1).

nullptr
  • 11,008
  • 1
  • 23
  • 18