10

To avoid maintaining complex data structures, I want to allocate blocks with quite large alignment (say some kilobytes, possibly megabytes, always by power of two). This allows me to mask the lower bits of a pointer to easily retrieve the address of the beginning of the block it points in.

I'd like a method to guarantee the allocation of such a block with specified alignment, eg. to allocate 4096 byte blocks with 4096 byte alignment. For the method to work, the alignment will always be the size of the blocks, so memory waste is expected to be a concern in the long run.

I'm using C++ (so C and C++ techniques are fine), and any solution should be portable across common desktop environments. Should there be no portable solution, Linux has highest priority.

I'm aware of Win32 memory allocation with large alignment, but if there is a common C library which does this with one function call, I'd happily use it.

Background: I'm experimenting with the Vlist structures described there (ultimate goal is a sort of Scheme interpreter), and I'm currently implementing garbage collection for those lists. I need the quite large memory blocks as arenas for the garbage collector. Should I change the GC technique, I still need the VList blocks to have 32 byte alignment (I'm performing my experiments on 64bit machines).

Community
  • 1
  • 1
Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
  • @Pubby: not portable. I know about it, but I'd like to know about alternatives. – Alexandre C. Jan 14 '12 at 14:20
  • I don't think you're going to find anything that's generally portable. – Hot Licks Jan 14 '12 at 14:21
  • You can always roll your own, although it requires some shenanigans. – Pubby Jan 14 '12 at 14:22
  • Have you looked at overloading the new and delete operators? You can do it at the class level or globaly. – Rob Jan 14 '12 at 14:49
  • There can't really be anything "portable", because C++ doesn't make any statement about the *numeric* value of a pointer. Basically, reinterpreting a pointer as an integer is a platform-dependent operation, and from there on you're on your own. – Kerrek SB Jan 14 '12 at 14:53
  • @KerrekSB: I know this. In my case portable should read x86 + linux or windows. – Alexandre C. Jan 14 '12 at 15:05
  • The C11 standard has `aligned_alloc`. This interface is a bit different from `posix_memalign` in that it is less detailed in the error return if it fails. My guess would be that these parts are synchronized between C11 and C++11, so this could be in C++11, too. – Jens Gustedt Jan 14 '12 at 15:18
  • @AlexandreC. I'm wondering if you've tried measuring the overhead of the library aligned mallocs from the answers (`posix_memalign`, '_mm_malloc()'). I won't be surprised at all if they also have a high overhead since they could also be implemented in a similar way as the one in my answer. In that case, then probably the only solution is to write your own allocator that does the bookkeeping separately. (In other words, doesn't put any meta information next to the allocated region.) That would be the only way to avoid the large overhead. – Mysticial Jan 14 '12 at 23:45
  • @Mysticial: the overhead will be measured once the GC is complete, because many things will add overhead over the top. I'm experimenting with the mentioned allocators (the one you mentioned, and I will also try nedmalloc), and I think you're right in what you say: allocating larger blocks and cutting them in chunks myself is a solution to the memory waste problem, should it present itself. – Alexandre C. Jan 15 '12 at 10:19

3 Answers3

4

I'm not aware of a fully portable solution. But _mm_malloc() and _mm_free() seem to be supported by ICC, GCC, and MSVC.

This was added as part of the aligned memory support for SSE intrinsics.


Otherwise, you could implement your own fairly easily:

void* my_malloc(size_t bytes,size_t align){

    void *ptr = malloc(bytes + align + sizeof(intptr_t));

    if (ptr == NULL)
        return NULL;

    //  Get aligned return address
    intptr_t *ret = (intptr_t*)((((intptr_t)ptr + sizeof(intptr_t)) & ~(intptr_t)(align - 1)) + align);

    //  Save the free pointer
    ret[-1] = (intptr_t)ptr;

    return ret;
}

void my_free(void *ptr){
    if (ptr == NULL)
        return;

    //  Get the free pointer
    ptr = (void*)(((intptr_t*)ptr)[-1]);

    free(ptr); 
}
Community
  • 1
  • 1
Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • Saving the pointer like this may raise fragmentation issues should I begin allocating a bunch of 1Mb blocks. But I will consider this a premature optimization issue. Thanks for the `_mm_malloc/free` pointer, this seems very close to what I want. – Alexandre C. Jan 14 '12 at 14:30
  • The amount of wasted space is `align + sizeof(intptr_t)`. I suppose that isn't too bad if you're allocating a bunch of 1MB blocks with only 4k alignment. But if you're going to do it with 1MB alignment, then yes, it's roughly 50% overhead. – Mysticial Jan 14 '12 at 14:42
  • Yes, roughly half of the memory would be wasted in my case. For development, this is not a problem, but eventually I would have to deal with the issue. – Alexandre C. Jan 14 '12 at 14:50
  • Come to think of it, you're (almost) guaranteed to waste half your address space in any implementation. But I suppose one that works with the VM at a low level will waste no more than 1 page per allocation. – Mysticial Jan 14 '12 at 14:52
1

Intel Thread Building Blocks have a open-source cross-platform scalable memory allocator with support for alignment.

void* scalable_aligned_malloc(size_t size, size_t alignment);
ronag
  • 49,529
  • 25
  • 126
  • 221
1

efficient large alignment that is portable is really possible without using system calls, in which case you could just build a wrapper around VirtualAlloc and mmap, this would give you page level alignment, generally 64kb.

but if you only need 32 bytes, just copy the source from the windows crt for aligned malloc and free, its backed by standard malloc and should be perfectly portable(even better would be the glibc version). alternatively you could look into a custom allocater like nedmalloc

Necrolis
  • 25,836
  • 3
  • 63
  • 101