3

I want to use a long integer that will be interpreted as a number when the MSB is set otherwise it will be interpreted as a pointer. So would this work or would I run into problems in either C or C++?

This is on a 64-bit system.

Edited for clarity and a better description.

  • 3
    On what system? Why do you want to do such a thing in the first place? – Carl Norum May 20 '13 at 20:51
  • do you count from the MSB or the LSB? – Andrei May 20 '13 at 20:52
  • 2
    Do you have an extremely compelling reason to not use an extra byte for the extra information and skip the packing into a pointer? – Timothy Shields May 20 '13 at 20:53
  • Linux, sorry I forgot to add that. It would be the MSB. My intial guess is that this wouldn't be used as you would need an unrealistically enormous amount of RAM for those memory addresses to even exist. –  May 20 '13 at 20:54
  • 2
    Do you even know that your pointer has a 63rd bit? – Joseph Mansfield May 20 '13 at 20:54
  • @JacksonForce - physical RAM and locations in virtual memory need not be consistent. It would be entirely reasonable, for example, for an implementation to decide that the 63rd bit marks some special part of the virtual address space. – Chris Stratton May 20 '13 at 20:56
  • [You want to do this?](http://en.wikipedia.org/wiki/Mac_OS_memory_management#32-bit_clean) – Mooing Duck May 20 '13 at 20:56
  • Who cares how much RAM you have? ASLR could get you at any time. – Carl Norum May 20 '13 at 20:56
  • @TimothyShields I'm not sure exactly what you mean here. I want to use a single long int and check if the first bit is set before using it. –  May 20 '13 at 20:57
  • @JacksonForce - it might work sometimes, but it's not safe without more constraint of target platform, because you don't know how an unspecified implementation of "Linux" lays out the virtual address space, or even (from the limited information given) how long a pointer is. – Chris Stratton May 20 '13 at 20:59
  • @CarlNorum ok so there's nothing in C/C++ that limits the range of a pointer in a 64-bit system? i.e. there's no reason to assume that the last bit will never be set in a pointer? –  May 20 '13 at 20:59
  • 3
    Much better to encode the information in the least significant bits. And use a suitably aligned allocation. – David Heffernan May 20 '13 at 21:00
  • @JacksonForce: Actually, as Carl mentions, there's _every_ reason to assume that the last bits _are_ used in a pointer. – Mooing Duck May 20 '13 at 21:00
  • @JacksonForce There is something in C/C++ that limits the range of a pointer `p` in a 64-bit system: `0 <= p < 2^64`. – Timothy Shields May 20 '13 at 21:01
  • "**ever**" is a really long time! – Nik Bougalis May 20 '13 at 21:01
  • @TimothyShields - is there something that says a 64-bit system necessarily has 64-bit pointers? – Chris Stratton May 20 '13 at 21:02
  • @ChrisStratton Nope. Who said that? Not me. :) – Timothy Shields May 20 '13 at 21:03
  • 1
    If you are only dealing with pointer values returned by `malloc()` or `new`, then on most systems, neither the 0th or 1st bit will be set. – jxh May 20 '13 at 21:24
  • @user315052: True, but not necessarily useful. Pointers to successive elements of an array of `char` will have the low-order bit alternately set and cleared (assuming a typical addressing scheme). – Keith Thompson May 20 '13 at 21:42
  • @KeithThompson: Yes, hence my qualification. – jxh May 20 '13 at 21:45
  • Which bit do you refer to as the 63rd bit? The most significant? The least significant? – David Rodríguez - dribeas May 20 '13 at 22:22
  • The answers explain why the MSB should not be used for pointer packing but the LSB can be a good candidate for pointer packing if the pointed to type is large than 1 byte. – Praxeolitic Aug 18 '17 at 09:53

9 Answers9

14

On x86-64, you WILL have a pointer that is over 47 bits in address have the 63rd bit set, since all the bits above "max number of bits supported by the architecture" (which is currently 48) must all have the same value as the most significant bit of the value itself. (That is any address above 0007 FFFF FFFF FFFF will be FFF8 0000 0000 0000 - everything in between is "invalid" as a pointer)

That may well be addresses ONLY used by the kernel, but I'm not sure it's guaranteed to be.

However, I would try to avoid using tricks like this - it's likely to come back and haunt you at some point.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • 2
    [Here's where it bit a _lot_ of Mac programmers](http://en.wikipedia.org/wiki/Mac_OS_memory_management#32-bit_clean) – Mooing Duck May 20 '13 at 21:01
  • @MooingDuck Indeed, that was exactly why AMD, when they designed the 64-bit architecture, made the rule of "the unused bit has to match the top used bit" - in other words, you are ensuring nobody decides to stuff some "good" bits in the top bits. Someone learned that the hard way, I expect. – Mats Petersson May 20 '13 at 21:03
6

People have tried tricks like this before.

It never works out well in the long run.

Simply don't do it.

Edit: better link - see reference to 'bit31', which was previously never returned as set. Once it could be set (over 2 gigs of RAM, gasp!) it would break naughty programs and therefore programs needed to opt into this option once this much memory became the norm as people had used trickery like this (amongst other things). And now my lovely, short and to the point answer has become too long :-)

Community
  • 1
  • 1
Mike Vine
  • 9,468
  • 25
  • 44
4

So would this work or would I run into problems in either C or C++?

Do you have 64 bits? Do you want your code to be portable to 32 bit systems? long does not necessarily have 64 bits. Big-endian v. little-endian? (Do you know which your system is?)

Plus, hopeless confusion. Please just use an extra variable to store this information or you will have many many bugs surrounding this.

djechlin
  • 59,258
  • 35
  • 162
  • 290
3

It depends on the architecture. x86_64 architecture, for example, is currently using 48-bit addressing. It means that you could use 16 bits for your own needs (a trick that sometimes referred to as "pointer packing"). However, even the x86_64 architecture definition allows this limit to be raised in future implementations to the full 64 bits. If that happens, you may run into a situation where a lot of your code might need to be changed. So if you really must go that way, make sure your pointer packing is kept in one place that is easy to change in the future. For other architectures you have to check for yourself.

1

Take a look at boost::lockfree::detail::tagged_ptr from boost.lockfree

This is a class that was introduced in latest 1_53 boost. It stores pointer and additional 16 bites in 64 bites variable.

Nikolay Viskov
  • 1,016
  • 6
  • 9
  • Can you post more info? I can't learn anything from this answer without following through the link. – djechlin May 20 '13 at 21:04
1

Unless you really need the space, or you're keeping alot of these things around, I would just use a plain union, and add a tag field. If you're going to go down that route, make sure that your memory is aligned to fit your needs.

mattsills
  • 264
  • 2
  • 5
0

Remember that the virtual address returned to your program does may necessarily line up to the actual physical address in memory. Infact, unless you are directly manipulating pretty special memory [e.g. some forms of graphics memory] then this is absolutely the case.

In this case, its the maximum value of the MMU which defines the values of the pointers your program sees. In which case, for x64 I'm pretty sure its (currently) 48bits, but as Mats specifies above once you've got the top bit set in the 48, you get the 63'd bit says aswell.

So taking his answer and mine - its entirely possible to get a pointer with the 47th bit set even with a small amount of RAM, and once you do you get the 63rd bit set.

Mike Vine
  • 9,468
  • 25
  • 44
0

Don't do such tricks. If you need to distinguish integers from pointers inside some container, consider using separate bit set to indicate such flag. In C++ std::bitset could be good enough.

Reasons:

  • Actually nobody guarantees pointers are long unsigned or long long unsigned. If you need to store them, always apply sizeof() and void * type (if you need to remove information about pointed object).
  • Even on one system addresses are highly dependent on architecture.
  • Kernel modules could seriously change mapping logics for process so you never know what addresses you will need.
Roman Nikitchenko
  • 12,800
  • 7
  • 74
  • 110
0

If the "64-bit system" in question is x86_64, then yes, it will work.

Zdeněk Pavlas
  • 357
  • 2
  • 5