5

I'm looking to call a special constructor/destructor for stack allocated instances of a class, and to either:

  1. Ban all other instances (I think this is impossible).

or

  1. Make them call different constructor/destructors.

Is this possible to do portably?

Why would I want to do this?
I'm writing a container class library that holds a potentially large dynamically allocated buffer.
Conceptually it should work like:

class SArray(){
    size_t size;
    char* data;
    SArray(size_t _size):size(_size){
        data = (char*)malloc(size);
    }

    ~SArray(){
        free(data);
    }
... Some operators, etc. ...
}

However, this is a major slowdown for users in my field(high performance robotics). When you allocate from the system, big allocations are first mapped to the zero page, causing pagefaults when you write to them, and then -for security reasons- the kernel spends a lot of time zeroing the data before giving it to you. My users tend to allocate many small buffers(sub-KB) and very large buffers (1-1000 MB).

If we can be sure that the constructors and destructors are called in LIFO(stack) order, then we can just have a giant virtual stack to do our memory allocations from:

class SArray(){
    static char backing_stack[1000000000000]; // 1TB, at least as big as all of RAM
    static sp = 0;
    size_t size;
    char* data;
    SArray(size_t _size):size(_size){
        data = backing_stack + sp;
        sp += size;
    }

    ~SArray(){
        sp -= size;
    }
... Some operators, etc. ...
}

Here the pagefaults and zeroing from the kernel only happen the first time we touch part of our backing_stack [1].

This answer shows that LIFO order is enforced in C++ independently for each storage type, which means that if we declare:

void foo(){
    static SArray static_decl(100);
    thread_local SArray tl_decl(1000);
    SArray local_decl(2000);
}

then strange things could happen if they all use the same backing_stack. The obvious and preferred solution is to have constuctors that use different stacks for different storage types.

If that's not possible, we could settle for a class that simply cannot be created except as a local variable, as long as we can automatically detect that someone tried to do it and issue a compiler error. The speed gain of preallocating is so great that I am simply willing to ban other uses if needed.

So far, I have found that you can delete the new, placement new, and delete operators, so someone accidentally heap allocating isn't a big worry. However, unsuspecting users could try to create static or thread_local instances. Temporaries and lifetime extension also seem like they could pose a problem.

A final option I explored and rejected was writing a hash table based allocation scheme. The problem there was that memory would get fragmented if users allocated many containers smaller than page size. Dealing with this is what makes malloc() so slow for small items compared to the call stack.

[1]: We can't use alloca() or equivalent because our allocations are often so large that they blow past the guard page(s), silently corrupting our stack.

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
  • 1
    How about instances of the class that are member variables of other types? Do you want the "stack vs heap" property to be transient? –  Jul 13 '21 at 22:42
  • 6
    What I suggest you do is: a) stop mixing allocators and containers - create an independent allocator class. b) In your container, properly interface with allocator whenever you are removing objects (i.e. as a result of a destruction). In allocator, implement proper freeing/allocating and do not rely on any particular order. – SergeyA Jul 13 '21 at 22:44
  • 3
    If the slow part is allocating from the OS and zeroing memory, then yeah, just make a custom allocator that preallocates from the OS. Then you can use all the standard C++ containers still. – Mooing Duck Jul 13 '21 at 22:47
  • @Frank Probably would have to make sure the solution can apply to members. I would be perfectly happy enforcing that this type not be allowed as a member type, but I can't think of a way to do that in C++. @mooing I think Frank is especially thinking of the case where we `new` an object with our type as a member, which unfortunately would not trigger our custom new operator. – PaulWithAHat Jul 13 '21 at 23:47
  • @SergeyA Actually I have done that in the real code, but it was simpler to combine them in the MWE. My use case is a matrix library similar to Eigen, but with a bit different design goals. As for "implement proper freeing/allocating and do not rely on any particular order", can you suggest how to do this without the downsides of the hash table I mentioned? – PaulWithAHat Jul 14 '21 at 00:11
  • The stack ordering constraint helps a lot in my tests, because we don't have to dig around to find free blocks or worry about memory fragmentation, which is also why the compiler uses the call stack to store locals. – PaulWithAHat Jul 14 '21 at 00:13
  • @Mooing Duck: Yep, actually I have seen a lot of users that have macros like: `#define STACKARRAY(NAME, SIZE) thread_local vector NAME##backing(SIZE); SArray NAME(0); NAME.data=&NAME##backing[0]; NAME.size = SIZE;` Those macros are the reason I realized this was a problem users cared about. They work, but they break re-entrancy, and require predeclaring all of the containers you will use in a function. Also hard to read, since they don't look like variable declarations. – PaulWithAHat Jul 14 '21 at 00:27
  • So I think the answer is, or is similar to, [`boost::pool_allocator`](https://www.boost.org/doc/libs/1_41_0/libs/pool/doc/interfaces/pool_alloc.html) – Mooing Duck Jul 14 '21 at 01:30
  • @MooingDuck No need (at least with C++17) to bring in `boost`, the [std library](https://en.cppreference.com/w/cpp/header/memory_resource) comes with several allocators – Quxflux Jul 21 '21 at 15:39

0 Answers0