14

Let me just say up front that what I'm aware that what I'm about to propose is a mortal sin, and that I will probably burn in Programming Hell for even considering it.

That said, I'm still interested in knowing if there's any reason why this wouldn't work.

The situation is: I have a reference-counting smart-pointer class that I use everywhere. It currently looks something like this (note: incomplete/simplified pseudocode):

class IRefCountable
{
public:
    IRefCountable() : _refCount(0) {}
    virtual ~IRefCountable() {}

    void Ref() {_refCount++;}
    bool Unref() {return (--_refCount==0);}

private:
    unsigned int _refCount;
};

class Ref
{
public:
   Ref(IRefCountable * ptr, bool isObjectOnHeap) : _ptr(ptr), _isObjectOnHeap(isObjectOnHeap) 
   { 
      _ptr->Ref();
   }

   ~Ref() 
   {
      if ((_ptr->Unref())&&(_isObjectOnHeap)) delete _ptr;
   }

private:
   IRefCountable * _ptr;
   bool _isObjectOnHeap;
};

Today I noticed that sizeof(Ref)=16. However, if I remove the boolean member variable _isObjectOnHeap, sizeof(Ref) is reduced to 8. That means that for every Ref in my program, there are 7.875 wasted bytes of RAM... and there are many, many Refs in my program.

Well, that seems like a waste of some RAM. But I really need that extra bit of information (okay, humor me and assume for the sake of the discussion that I really do). And I notice that since IRefCountable is a non-POD class, it will (presumably) always be allocated on a word-aligned memory address. Therefore, the least significant bit of (_ptr) should always be zero.

Which makes me wonder... is there any reason why I can't OR my one bit of boolean data into the least-significant bit of the pointer, and thus reduce sizeof(Ref) by half without sacrificing any functionality? I'd have to be careful to AND out that bit before dereferencing the pointer, of course, which would make pointer dereferences less efficient, but that might be made up for by the fact that the Refs are now smaller, and thus more of them can fit into the processor's cache at once, and so on.

Is this a reasonable thing to do? Or am I setting myself up for a world of hurt? And if the latter, how exactly would that hurt be visited upon me? (Note that this is code that needs to run correctly in all reasonably modern desktop environments, but it doesn't need to run in embedded machines or supercomputers or anything exotic like that)

phuclv
  • 37,963
  • 15
  • 156
  • 475
Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • 6
    That trick is used in Boost - see at the end of the [the boost::multi_index](http://www.boost.org/doc/libs/1_35_0/libs/multi_index/doc/tutorial/indices.html#ordered_node_compression) docs. – Aaron McDaid Jun 13 '11 at 04:06
  • >> there are 7.875 wasted bytes of RAM. What? You have analog bytes? – Steve Wellens Jun 13 '11 at 04:08
  • 10
    Well, 7 bytes of padding, plus 7 bits in the final byte that aren't used, because boolean values only require one bit. Sort of like how the average family has 2.5 kids :) – Jeremy Friesner Jun 13 '11 at 04:12
  • Is it allowed to answer the ways we can skip the `bool` variable ? – iammilind Jun 13 '11 at 04:16
  • I just wanted to mention that this is also used in Boost.Function, to store wether the saved function object is trivially destructible and some other stuff. – Xeo Jun 13 '11 at 04:33
  • Really gotta ask why you need `isOnHeap`. If it's not on heap, then why are you reference counting it? – Puppy Jun 13 '11 at 11:30
  • @DeadMG Sometimes I have a function that only takes a Ref as an argument, but I nevertheless want to pass it an object that is on the stack (or is a static or global). In that case I need a way to have a Ref that acts like a normal/dumb pointer, and in particular I need the Ref to refrain from calling delete on its IRefCountable. – Jeremy Friesner Jun 13 '11 at 16:39
  • 1
    On x86 you might be better off using the high bit, as on most ABIs that space is reserved for the kernel (and since on x86 you can easily have an unaligned pointer though as you say hopefully for a non-POD the allocator will never make one). No matter what bit you use, you should make sure to check in your constructor that the bit is cleared on entry. – Jack Lloyd Jun 13 '11 at 16:40
  • @JackLloyd that's a bad idea. The use of the high bit in many apps is the reason why the [`/LARGEADDRESSAWARE` flag](https://docs.microsoft.com/en-us/cpp/build/reference/largeaddressaware-handle-large-addresses?view=msvc-160&viewFallbackFrom=vs-2017) was introduced. 32-bit apps have it turned off so they can't access more than 2GB of RAM by default. It's also why FSGSBASE support in Linux was so late compared to Windows: [the Linux kernel uses the most significant bit of the address as a flag](https://software.intel.com/security-software-guidance/best-practices/guidance-enabling-fsgsbase) – phuclv Mar 27 '21 at 15:54
  • @JackLloyd the high bit(s) [are good to use in x86-64](https://stackoverflow.com/q/16198700/995714) with great care though. ARM64 also have an option to ignore the high bits for use in tagged pointers called "Top Byte Ignore" or TBI – phuclv Mar 27 '21 at 15:58
  • Side note: I discovered (the fun way) that recent 64-bit versions of Android reserve the highest 16 bits of pointers for security fingerprinting, so if you try to shoehorn your own data into those bits, your program will crash. (Details here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/enhancing-memory-safety ) – Jeremy Friesner Jun 09 '23 at 17:20

7 Answers7

3

If you want to use only the standard facilities and not rely on any implementation then with C++0x there are ways to express alignment (here is a recent question I answered). There's also std::uintptr_t to reliably get an unsigned integral type large enough to hold a pointer. Now the one thing guaranteed is that a conversion from the pointer type to std::[u]intptr_t and back to that same type yields the original pointer.

I suppose you could argue that if you can get back the original std::intptr_t (with masking), then you can get the original pointer. I don't know how solid this reasoning would be.

[edit: thinking about it there's no guarantee that an aligned pointer takes any particular form when converted to an integral type, e.g. one with some bits unset. probably too much of a stretch here]

Community
  • 1
  • 1
Luc Danton
  • 34,649
  • 6
  • 70
  • 114
  • +1 for mentioning a way that invokes undefined behavior in a way that likely has the most predictable results and detectable failure modes of any method here. – Omnifarious Jun 13 '11 at 13:00
2

The problem here is that it is entirely machine-dependent. It isn't something one often sees in C or C++ code, but it has certainly been done many times in assembly. Old Lisp interpreters almost always used this trick to store type information in the low bit(s). (I have seen it in C code, but in projects that were being implemented for a specific target platform.)

Personally, if I were trying to write portable code, I probably wouldn't do this. The fact is that it will almost certainly work on "all reasonably modern desktop environments". (Certainly, it will work on every one I can think of.)

A lot depends on the nature of your code. If you are maintaining it, and nobody else will ever have to deal with the "world of hurt", then it might be OK. You will have to add #ifdef directives for any odd architecture that you might need to support later on. On the other hand, if you are releasing it to the world as "portable" code, that would be cause for concern.

Another way to handle this is to write two versions of your smart pointer, one for machines on which this will work and one for machines where it won't. That way, as long as you maintain both versions, it won't be that big a deal to change a config file to use the 16-byte version.

It goes without saying that you would have to avoid writing any other code that assumes sizeof(Ref) is 8 rather than 16. If you are using unit tests, run them with both versions.

user16217248
  • 3,119
  • 19
  • 19
  • 37
andrewdski
  • 5,255
  • 2
  • 20
  • 20
  • Linux uses the lower bit of a pointer as a boolean for its red-black tree implementation. This is done without any flags for "weird architectures" and such. I would consider Linux to be at the upper end of portability. So I think we can safely say that this works safely and reliably. (See: https://github.com/torvalds/linux/blob/master/tools/include/linux/rbtree_augmented.h#L152) – Eric Scrivner Feb 23 '23 at 17:07
1

Any reason? Unless things have changed in the standard lately, the value representation of a pointer is implementation-defined. It is certainly possible that some implementation somewhere may pull the same trick, defining these otherwise-unused low bits for its own purposes. It's even more possible that some implementation might use word-pointers rather than byte-pointers, so instead of two adjacent words being at "addresses" 0x8640 and 0x8642, they would be at "addresses" 0x4320 and 0x4321.

One tricky way around the problem would be to make Ref a (de facto) abstract class, and all instances would actually be instances of RefOnHeap and RefNotOnHeap. If there are that many Refs around, the extra space used to store the code and metadata for three classes rather than one would be made up by the space savings in having each Ref being half the size. (Won't work too well, the compiler can omit the vtable pointer if there are no virtual methods and introducing virtual methods will add the 4-or-8 bytes back to the class).

Anomie
  • 92,546
  • 13
  • 126
  • 145
  • 1
    of course abstract means with virtual methods and then instead of a "byte" used for the boolean, you have a full-blown pointer to the vtable (in traditional implementation)... – Matthieu M. Jun 13 '11 at 11:08
  • @MatthieuM.: Good point, I haven't messed with C++ for a while and forgot that it can omit the vtable pointer if there are no virtual methods. – Anomie Jun 13 '11 at 12:49
1

You always have at least a free bit to use in the pointer as long as

  1. you're not pointing to arbitrary positions inside a struct or array with alignment of 1, or
  2. the platform gives you a free bit

Since IRefCountable has an alignment of 4, you'll have 2 free bottom bits in IRefCountable* to use


Regarding the first point, storing data in the least significant bit is always reliable if the pointer is aligned to a power of 2 larger than 1. That means it'll work for everything apart from char*/bool* or a pointer to a struct containing all char/bool members, and obviously it'll work for IRefCountable* in your case. In C++11 you can use alignof or std::alignment_of to ensure that you have the required alignment like this

static_assert(alignof(Ref) > 1);
static_assert(alignof(IRefCountable) > 1);

// This check for power of 2 is likely redundant
static_assert((alignof(Ref) & (alignof(Ref) - 1)) == 0);

// Now IRefCountable* is always aligned,
// so its least significant bit can be used freely

Even if you have some object with only 1-byte alignment, for example if you change the _refCount in IRefCountable to uint8_t, then you can still enforce alignment requirement with alignas, or with other extensions in older C++ like __declspec(align). Dynamically allocated memory is already aligned to max_align_t, or you can use aligned_alloc() for a higher level alignment


My second bullet point means in case you really need to store arbitrary pointers to objects with absolute 1-byte alignment then most of the time you can still utilize the feature from the platform

  • On many 32-bit platforms the address space is split in half for user and kernel processes. User pointers will always have the most significant bit unset so you can use that to store data. Of course it won't work on platforms with more than 2GB of user address space, like when the split is 3/1 or 4/4
  • On 64-bit platforms currently most have only 48-bit virtual address, and a few newer high-end CPUs may have 57-bit virtual address which is far from the total 64 bits. Therefore you'll have lots of bits to spare. And in reality this always work in personal computing since you'll never be able to fill that vast address space

This is called tagged pointer

If the data is always heap-allocated then you can tell the OS to limit the range of address space to use to get more bits

For more information read Using the extra 16 bits in 64-bit pointers

phuclv
  • 37,963
  • 15
  • 156
  • 475
1

Yes, this can work reliably. This is, in fact, used by the Linux kernel as part of its red-black tree implementation. Instead of storing an extra boolean to indicate whether a node is red or black (which can take up quite a bit of additional space), the kernel uses the low-order bit of the parent node address.

From rbtree_types.h:

struct rb_node {
    unsigned long  __rb_parent_color;
    struct rb_node *rb_right;
    struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));

The __rb_parent_color field stores both the address of the nodes parent and the color of the node (in the least-significant bit).

Getting The Pointer

To retrieve the parent address from this field you just clear the lower order bits (this clears the lowest 2-bits).

From rbtree.h:

#define rb_parent(r)   ((struct rb_node *)((r)->__rb_parent_color & ~3))

Getting The Boolean

To retrieve the color you just extract the lower bit and treat it like a boolean.

From rbtree_augmented.h:

#define __rb_color(pc)     ((pc) & 1)
#define __rb_is_black(pc)  __rb_color(pc)
#define __rb_is_red(pc)    (!__rb_color(pc))
#define rb_color(rb)       __rb_color((rb)->__rb_parent_color)
#define rb_is_red(rb)      __rb_is_red((rb)->__rb_parent_color)
#define rb_is_black(rb)    __rb_is_black((rb)->__rb_parent_color)

Setting The Pointer And Boolean

You set the pointer and boolean value using standard bit manipulation operations (making sure to preserve each part of the final value).

From rbtree_augmented.h:

static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
{
    rb->__rb_parent_color = rb_color(rb) | (unsigned long)p;
}

static inline void rb_set_parent_color(struct rb_node *rb,
                       struct rb_node *p, int color)
{
    rb->__rb_parent_color = (unsigned long)p | color;
}

You can also clear the boolean value setting it to false via (unsigned long)p & ~1.

Eric Scrivner
  • 1,839
  • 1
  • 18
  • 23
0

There will be always a sense of uncertainty in mind even if this method is working, because ultimately you are playing with the internal architecture which may or may not be portable.

On the other hand to solve this problem, if you want to avoid bool variable, I would suggest a simple constructor as,

Ref(IRefCountable * ptr) : _ptr(ptr) 
{
  if(ptr != 0) 
    _ptr->Ref();
}

From the code, I smell that the reference counting is needed only when the object is on heap. For automatic objects, you can simply pass 0 to the class Ref and put appropriate null checks in constructor/destructor.

iammilind
  • 68,093
  • 33
  • 169
  • 336
  • The problem with setting the pointer to NULL is that sometimes I need to pass a stack-object to a function that takes a Ref... – Jeremy Friesner Jun 13 '11 at 04:40
  • @Jeremy Friesner: Then overload that function to take a raw pointer. – Puppy Jun 13 '11 at 11:32
  • @DeadMG ... and also overload all the functions it calls, and all the functions they call, and so on? I think I'd end up maintaining two separate copies of my codebase :) – Jeremy Friesner Jun 13 '11 at 16:42
0

Have you thought about an out of class storage ?

Depending on whether you have (or not) to worry about multi-threading and control the implementation of new/delete/malloc/free, it might be worth a try.

The point would be that instead of incrementing a local counter (local to the object), you would maintain a "counter" map address --> count that would haughtily ignore addresses passed that are outside the allocated area (stack for example).

It may seem silly (there is room for contention in MT), but it also plays rather nice with read-only since the object is not "modified" only for counting.

Of course, I have no idea of the performance you might hope to achieve with this :p

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722