77

The common folklore says that:

  • The type system exists for a reason. Integers and pointers are distinct types, casting between them is a malpractice in the majority of cases, may indicate a design error and should be avoided.

  • Even when such a cast is performed, no assumptions shall be made about the size of integers and pointers (casting void* to int is the simplest way to make the code fail on x64), and instead of int one should use intptr_t or uintptr_t from stdint.h.

Knowing that, when is it actually useful to perform such casts?

(Note: having a bit shorter code for the price of portability doesn't count as "actually useful".)


One case I know:

  • Some lock-free multiprocessor algorithms exploit the fact that a 2+-byte-alligned pointer has some redundancy. They then use the lowest bits of the pointer as boolean flags, for instance. With a processor having an appropriate instruction set, this may eliminate the need for a locking mechanism (which would be necessary if the pointer and the boolean flag were separate).
    (Note: This practice is even possible to do safely in Java via java.util.concurrent.atomic.AtomicMarkableReference)

Anything more?

Kos
  • 70,399
  • 25
  • 169
  • 233
  • 5
    The mapping between a pointer and an `intptr_t` is implementation defined so I wouldn't use the lock-free algorithm either unless I knew exactly which compiler it was going to run on. – Andreas Brinck Aug 22 '11 at 11:39
  • 6
    Every lockfree algorithms exploits at least some implementation specific properties... – PlasmaHH Aug 22 '11 at 12:16
  • 3
    @PlasmaHH: Good Point. C (and C++ before C++11) does not have any notion of multi-threaded programs, or shared program memmory. Therefore, if you have any use for lockfree algorithms you are already relying on implementation specific properties, but it is worth remember this, since it is easy to forget that the implementation is not required to do the "normal" thing here. – Kevin Cathcart Aug 22 '11 at 16:22
  • 1
    Actually, `uintptr_t` is in ``, or `` in C++0x. Visual C++ 2008 is wrong, if that's where you got it from. – wilhelmtell Aug 25 '11 at 01:47
  • I don't do Visual C++ and that was an obvious mistake by me, thanks! :) – Kos Aug 25 '11 at 11:06

15 Answers15

38

I sometimes cast pointers to integers when they somehow need to be part of a hashsum. Also I cast them to integers to do some bitfiddling with them on certain implemetnations where it is guaranteed that pointers always have one or two spare bits left, where I can encode AVL or RB Tree information in the left/right pointers instead of having an additional member. But this is all so implementation specific that I recommend to never think about it as any kind of common solution. Also I heard that sometimes hazard pointers can be implemented with such a thing.

In some situations I need a unique ID per object that I pass along to e.g. servers as my request id. Depending on the context when I need to save some memory, and it is worth it, I use the address of my object as such an id, and usually have to cast it to an integer.

When working with embedded systems (such as in canon cameras, see chdk) there are often magic addesses, so a (void*)0xFFBC5235 or similar is often found there too

edit:

Just stumbled (in my mind) over pthread_self() which returns a pthread_t which is usually a typedef to an unsigned integer. Internally though it is a pointer to some thread struct, representing the thread in question. In general it might used elsewhere for an opaque handle.

PlasmaHH
  • 15,673
  • 5
  • 44
  • 57
  • 1
    Instead of casting pointer *values* to integers for hashing, you should instead simply read their *representation* (as `unsigned char [sizeof(T *)]`) to hash... – R.. GitHub STOP HELPING ICE Aug 22 '11 at 12:33
  • 1
    As pointed out also by the OP, pointer values often have a redudancy, in that the lower bits are 0. Shifting them away, and then multiplying by e.g. 1000000007 is often yielding to a surprisingly well distributed hash, that suffices for quite some of my applications. Additionally I am not a fan of just blindly adding together bits and bits to form a hash, with a little bit of thought a faster domain specific hash can be found without rocket science effort. – PlasmaHH Aug 22 '11 at 12:41
  • 4
    +1 Nice to see you understand the hazards of what you're doing and suggest others not do it :-) I'm shocked that this is upvoted on SO and not getting a bunch of "don't micro-optimize" remarks. – phkahler Aug 22 '11 at 13:57
  • 5
    I love it when people writing my libraries micro optimize for me. It's when I waste time doing the micro optimizations that it's a problem >:) – Steven Schlansker Aug 22 '11 at 19:36
15

It could be useful when checking the alignment of types in general so that misaligned memory gets caught with an assert rather than just SIGBUS/SIGSEGV.

E.g.:

#include <xmmintrin.h>
#include <assert.h>
#include <stdint.h>

int main() {
  void *ptr = malloc(sizeof(__m128));
  assert(!((intptr_t)ptr) % __alignof__(__m128));
  return 0;
}

(In real code I wouldn't just gamble on malloc, but it illustrates the point)

Flexo
  • 87,323
  • 22
  • 191
  • 272
13

Storing a doubly linked list using half the space

A XOR Linked List combines the next and prev pointers into a single value of the same size. It does this by xor-ing the two pointers together, which requires treating them like integers.

Craig Gidney
  • 17,763
  • 5
  • 68
  • 136
  • 1
    Forgot about that ;) Cool hack for memory-critical embedded solutions – Kos Aug 22 '11 at 16:22
  • Apart from the fact that such a node can't easily be removed from the list when only a pointer to the node is given. – Maxim Egorushkin Aug 25 '11 at 11:18
  • 1
    Yes, you generally need to know two adjacent nodes in order to traverse or modify the list. You trade space for convenience. It's covered in the linked article. – Craig Gidney Aug 25 '11 at 16:38
8

One example is in Windows, e.g. the SendMessage() and PostMessage() functions. They take a HWnd (a handle to a window), a message (an integral type), and two parameters for the message, a WPARAM and an LPARAM. Both parameter types are integral, but sometimes you must pass pointers, depending on the message you send. Then you will have to cast a pointer to an LPARAM or WPARAM.

I would generally avoid it like the plague. If you need to store a pointer, use a pointer type, if that is possible.

Rudy Velthuis
  • 28,387
  • 5
  • 46
  • 94
  • 1
    That's not really a *use* of it, that's just because they're legacy code and that kind of design was common. In a more modern system, you would simply provide multiple callbacks. – Puppy Aug 22 '11 at 11:48
  • I don't do WinAPI so I didn't know that people do that. Do you know whether LPARAM and WPARAM are guaranteed by WinAPI to be big enough to be able to cointain a pointer? – Kos Aug 22 '11 at 11:50
  • Conceptually, `LPARAM` is not an integral type, but `LONG_PTR` - a union of a pointer and an integral type. But it's indeed a bit of hackery. @DeadMG: You could, on the `SendMessage` side. But the problem remains with `GetMessage`. You can't overload that because you can't predict what message you'll get. – MSalters Aug 22 '11 at 11:56
  • @MSalters: Today, it may be a LONG_PTR, a few years ago, it was still an integral type (UINT or DWORD, IIRC). You still had to use them to pass pointers. @ DeadMG: it is when you cast. – Rudy Velthuis Aug 22 '11 at 12:10
  • 1
    @Kos: yes, they are guaranteed to be big enough. Otherwise Windows would be severely hampered by the fact people could not send messages with pointer values. Windows uses messages for almost all the GUI stuff. – Rudy Velthuis Aug 22 '11 at 12:11
  • @DeadMG: even in Win64, you use the same calls and still cast pointers to integral types. I don't quite see how one would do this through multiple callbacks. One would probably do it like in .NET: pass argument classes, or pointers to structs containing the arguments. – Rudy Velthuis Aug 22 '11 at 12:41
8

The most useful case in my mind is the one that actually has the potential to make programs much more efficient: a number of standard and common library interfaces take a single void * argument which they will pass back to a callback function of some sort. Suppose your callback doesn't need any large amount of data, just a single integer argument.

If the callback will happen before the function returns, you can simply pass the address of a local (automatic) int variable, and all is well. But the best real-world example for this situation is pthread_create, where the "callback" runs in parallel and you have no guarantee that it will be able to read the argument through the pointer before pthread_create returns. In this situation, you have 3 options:

  1. malloc a single int and have the new thread read and free it.
  2. Pass a pointer to a caller-local struct containing the int and a synchronization object (e.g. a semaphore or a barrier) and have the caller wait on it after calling pthread_create.
  3. Cast the int to void * and pass it by value.

Option 3 is immensely more efficient than either of the other choices, both of which involve an extra synchronization step (for option 1, the synchronization is in malloc/free, and will almost certainly involve some cost since the allocating and freeing thread are not the same).

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 2
    And to think it could be made to be 100% safe by designing these functions take an `union {int i; void* p;}` instead of a `void*`. – Kos Aug 22 '11 at 12:45
  • 2
    More safe, but a lot more annoying to use. Pre-C99 (i.e. without compound literals), passing a `union` required making an ugly temp variable. The POSIX realtime signals interfaces used this approach (`union sigval`) and everybody hates it... – R.. GitHub STOP HELPING ICE Aug 22 '11 at 12:51
6

It's very common in embedded systems to access memory-mapped hardware devices where the registers are at fixed addresses in the memory map. I often model hardware differently in C vs. C++ (with C++ you can take advantage of classes and templates), but the general idea can be used for both.

A quick example: suppose you have a timer peripheral in hardware, and it has 2 32-bit registers:

  • a free-running "tick count" register, which decrements at a fixed rate (e.g. every microsecond)

  • a control register, which allows you to start the timer, stop the timer, enable a timer interrupt when we decrement the count to zero, etc.

(Note that a real timer peripheral is usually significantly more complicated).

Each of these registers are 32-bit values, and the "base address" of the timer peripheral is 0xFFFF.0000. You could model the hardware as follows:

// Treat these HW regs as volatile
typedef uint32_t volatile hw_reg;

// C friendly, hence the typedef
typedef struct
{
  hw_reg TimerCount;
  hw_reg TimerControl;
} TIMER;

// Cast the integer 0xFFFF0000 as being the base address of a timer peripheral.
#define Timer1 ((TIMER *)0xFFFF0000)

// Read the current timer tick value.
// e.g. read the 32-bit value @ 0xFFFF.0000
uint32_t CurrentTicks = Timer1->TimerCount;

// Stop / reset the timer.
// e.g. write the value 0 to the 32-bit location @ 0xFFFF.0004
Timer1->TimerControl = 0;

There are 100 variations on this approach, the pros and cons of which can be debated forever, but the point here is only to illustrate a common use of casting an integer to a pointer. Note that this code isn't portable, is tied to a specific device, assumes the memory region is not off-limits, etc.

Dan
  • 10,303
  • 5
  • 36
  • 53
  • Yes, initialisation of pointers from constants is a good example and very common in embedded. Integer->pointer is the more common of the two conversions, I'd say :) – Kos Aug 22 '11 at 17:07
3

It is never useful to perform such casts, unless you have full knowledge of the behaviour of your compiler+platform combination, and wish to exploit it (your question scenario is one such example).

The reason I say it is never useful is because in general, you don't have control of the compiler, nor full knowledge of what optimisations it may choose to do. Or to put it another way, you aren't able to precisely control the machine code it will generate. So in general, you can't implement this sort of trick safely.

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
  • 1
    You can't implement it in a portable way, but on a specific architecture/compiler you can certainly implement it safely if you understand the details. – phkahler Aug 22 '11 at 13:59
  • You don’t need encyclopedic knowledge of your compiler’s optimizations. If you wanted to prove your use of casts correct, you’d just need to know a few invariants. For example, on all widely used malloc implementations, `(uintptr_t) malloc(n) % 4 == 0` when n > 2. That is useful enough that you can do interesting things with it, and your code will be correct and safe on platforms where the assumed invariant holds. – Jason Orendorff Aug 22 '11 at 15:14
  • 3
    I think C99 guarantees a number of things like: if you cast a pointer to uintptr_t, then later cast the same pointer to uintptr_t, the resulting integer values are the same. That’s enough for such casts to be useful in computing hash codes. A little invariant goes a long way. – Jason Orendorff Aug 22 '11 at 15:24
  • 1
    @JasonOrendorff: C99 doesn't guarantee that. It guarantees that a pointer->uintptr_t->pointer round-trip will yield a pointer that compares equal to the original, but on a conforming implementation with e.g. 48-bit pointers and a 64-bit uintptr_t, something like `uintptr_t asUint = (uintptr_t)somePtr;` might simply write 48 bits and leave the other 16 bits holding arbitrary values. – supercat Oct 17 '17 at 22:50
2

The only time i cast a pointer to an integer is when i want to store a pointer, but the only storage i have available is an integer.

Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
  • 3
    And why would you want to do that? Under what situation is it useful? I would just change the storage to be a pointer. – R. Martinho Fernandes Aug 22 '11 at 11:44
  • Maybe something like where in the good old C callback systems there is only a void* available, there might be callback systems that only have a size_t available... – PlasmaHH Aug 22 '11 at 11:52
  • Is size_t always big enough for that? – Flexo Aug 22 '11 at 12:12
  • Yeah, but in all of these cases, you would use a int to pointer cast because you have to, not because it's the right thing to do. It's just a hack. In the case of the good ol' C callback you might pass the address of an integer variable you want to use. – deStrangis Aug 22 '11 at 12:25
  • 2
    @R. Martinho Fernandes: when it's not my code. All `Components` have a `Tag` property, which is an integer. If i want to associate an object/structure/string/pointer with a `Component`, i can do so through the `Tag` property. – Ian Boyd Aug 22 '11 at 14:25
  • 3
    Example: In "classic" MacOS, many structures such as `WindowRecord` had a 4-byte `userInfo` field that you could use to store any information you wanted, and which was commonly used to store a pointer to an auxiliary structure. In such cases you'd have to cast a pointer to int or long (I don't remember which) and back again just to make the compiler happy. – Caleb Aug 22 '11 at 17:36
  • Granted, it would crash and burn horribly once `pointers` and `integers` are no longer the same size - but it's never been an issue; even on a 64-bit operating system. – Ian Boyd Aug 22 '11 at 18:59
2

When is it correct to store pointers in ints? It's correct when you treat it as what it is: The use of a platform or compiler specific behavior.

The problem is only when you have platform/compiler specific code littered throughout your application and you have to port your code to another platform, because you've made assumptions that don't hold true any longer. By isolating that code and hiding it behind an interface that makes no assumptions about the underlying platform, you eliminate the problem.

So as long as you document the implementation, separate it behind a platform independent interface using handles or something that doesn't depend on how it works behind the scenes, and then make the code compile conditionally only on platforms/compilers where it's been tested and works, then there's no reason for you not to use any sort of voodoo magic you come across. You can even include large chunks of assembly language, proprietary API calls, and kernel system calls if you want.

That said, if your "portable" interface uses integer handles, integers are the same size as pointers on the implementation for a certain platform, and that implementation uses pointers internally, why not simply use the pointers as integer handles? A simple cast to an integer makes sense in that case, because you cut out the necessity of a handle/pointer lookup table of some sort.

James O'Doherty
  • 2,186
  • 13
  • 14
1

You may need to access memory at a fixed known address, then your address is an integer and you need to assign it to a pointer. This is somewhat common in embedded systems. Conversely, you may need to print a memory address and thus need to cast it to integer.

Oh, and don't forget you need to assign and compare pointers to NULL, which is usually a pointer cast of 0L

deStrangis
  • 1,912
  • 1
  • 18
  • 25
  • allright then. When you are writing the library routine that prints pointers. Duh! – deStrangis Aug 22 '11 at 11:54
  • In C++ 0 is the nullpointer literal, and no cast is involved. In fact, the bitpattern of a nullpointer does not even need to be the same as of an integer of the same size with a value of 0 ... – PlasmaHH Aug 22 '11 at 12:11
  • Yeah, good point, but that's not the case for C (note I used the word usually). If you are involved with fixed constant memory addresses -the situation that comes to my mind when you need to be doing integer to pointer casts- I'd say you are more likely to be using C than C++. – deStrangis Aug 22 '11 at 12:18
  • And casts are used much less in C++ than in C anyway, where they are essential. – deStrangis Aug 22 '11 at 12:32
1

I have one use for such a thing in network wide ID's of objects. Such a ID would combine identifications of machine (e.g IP address), process id and the address of the object. To be sent over a socket the pointer part of such an ID must be put into a wide enough integer such that it survives transport back and forth. The pointer part is only interpreted as a pointer (= cast back to a pointer) in the context where this makes sense (same machine, same process), on other machines or in other processes it just serves to distinguish different objects.

The things one needs to have that working is the existence uintptr_t and uint64_t as a fix width integer type. (Well only works on machines that have at most 64 addresses :)

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
1

under x64, on can use the upper bits of pointers for tagging (as only 47 bits are used for the actual pointer). this is great for things like run time code generation (LuaJIT uses this technique, which is an ancient technique, according to the comments), to do this tagging and tag checking you either need a cast or a union, which basically amount to the same thing.

casting of pointers to integers can also be very helpful in memory management systems that make use of binning, ie: one would be able to easily find the bin/page for an address via some math, an example from a lockless allocator I wrote a while back:

inline Page* GetPage(void* pMemory)
{
    return &pPages[((UINT_PTR)pMemory - (UINT_PTR)pReserve) >> nPageShift];
}
Necrolis
  • 25,836
  • 3
  • 63
  • 101
  • 1
    Heh. Mike Pall did not invent this technique. I’m confident it dates back to early Lisp implementations. – Jason Orendorff Aug 22 '11 at 14:42
  • 4
    AMD specifically warns against doing this, as it will break horribly when the address space is expanded. Just like it did on the 68000 when the address space was expanded from 24 to 32 bits. – Bo Persson Aug 22 '11 at 15:29
  • Just to second what Jason said, this technique is truly *ancient*, and has been used in countless language runtimes. – Stephen Canon Aug 22 '11 at 15:48
  • @Bo: got a link for that? Curious as to what else it might contain. Jason: updated to reflect your comment :) – Necrolis Aug 22 '11 at 16:54
  • @Necrolis - Nothing handy, no. This is just my recollections of AMD presenting the architecture and telling us not to redo the old mistakes. Also believe that you can already get hardware using 52 bits (no links here either). "47 bits will be enough for everybody". – Bo Persson Aug 22 '11 at 17:01
  • +1 on Bo's comment for remembering Motorola's pointers being 32-bits, but only 24 of those bits were used to address memory. People were warned not to stuff any sort of metadata in those extra 8 bits. – Ian Boyd Aug 22 '11 at 19:01
  • +1 for interesting technique, and -1 for actually the technique is just a hack which is not *guaranteed* to be safe by architecture. – eonil Aug 17 '12 at 12:14
  • 1
    @Eonil: obviously if you are going to be doing pointer tagging or making memory-management systems, you would need to know your underlying architecture, my answer is mainly focusing on x86, and under x86_64, all address space is linear, so it *is* guaranteed :) – Necrolis Aug 17 '12 at 14:34
0

I've used such systems when I'm trying to walk byte-by-byte through an array. Often times, the pointer will walk multiple bytes at a time, which causes problems that are very difficult to diagnose.

For example, int pointers:

int* my_pointer;

moving my_pointer++ will result in advancing 4 bytes (in a standard 32-bit system). However, moving ((int)my_pointer)++ will advance it one byte.

It's really the only way to accomplish it, other than casting your pointer to a (char*). ((char*)my_pointer)++

Admittedly, the (char*) is my usual method since it makes more sense.

Richard
  • 6,215
  • 4
  • 33
  • 48
0

Pointer values can also be a useful source of entropy for seeding a random number generator:

int* p = new int();
seed(intptr_t(p) ^ *p);
delete p;

The boost UUID library uses this trick, and some others.

Inverse
  • 4,408
  • 2
  • 26
  • 35
  • It is not guaranteed that in subsequent runs `new int()` (btw, initialization is unnecessary) produces a different value. There are well defined sources of entropy, such as `/dev/random` – Maxim Egorushkin Aug 25 '11 at 11:15
0

There is an old and good tradition to use pointer to an object as a typeless handle. For instance, some people use it for implementing interaction between two C++ units with flat C-style API. In that case, handle type is defined as one of integer types and any method have to convert a pointer into an integer before it can be transfered to another method that expects an abstract typeless handle as one of its parameter. In addition, sometimes there is no other way to break up a circular dependency.

Sergey Shamov
  • 171
  • 1
  • 2
  • I can't imagine such situation... Could you possibly provide a code sample? – Kos Aug 24 '11 at 08:05
  • It cannot be imagined, because it is not an abstract situation. It's very concrete case that hard to be illustrated with short sample. The general rule is: if you can implement interaction without typeless handles, do not use them. But one day you could face the fact that there is no other way. In that case, use it with no doubts. It's legal way if you are checking a type of objects at runtime after uncasting a pointers from an integers (for instance, with get_type_id() method). – Sergey Shamov Aug 24 '11 at 09:11
  • They often use a union of a pointer and an integer. See `struct epoll_data` for [epoll_ctl](http://www.kernel.org/doc/man-pages/online/pages/man2/epoll_ctl.2.html) for example. – Maxim Egorushkin Aug 25 '11 at 11:12
  • The unions are useful in such situation. It's like a casting, but a set of target types is limited. – Sergey Shamov Aug 26 '11 at 07:17