21

I have a few related questions about managing aligned memory blocks. Cross-platform answers would be ideal. However, as I'm pretty sure a cross-platform solution does not exist, I'm mainly interested in Windows and Linux and to a (much) lesser extent Mac OS and FreeBSD.

  1. What's the best way of getting a chunk of memory aligned on 16-byte boundaries? (I'm aware of the trivial method of using malloc(), allocating a little extra space and then bumping the pointer up to a properly aligned value. I'm hoping for something a little less kludge-y, though. Also, see below for additional issues.)

  2. If I use plain old malloc(), allocate extra space, and then move the pointer up to where it would be correctly aligned, is it necessary to keep the pointer to the beginning of the block around for freeing? (Calling free() on pointers to the middle of the block seems to work in practice on Windows, but I'm wondering what the standard says and, even if the standard says you can't, whether it works in practice on all major OS's. I don't care about obscure DS9K-like OS's.)

  3. This is the hard/interesting part. What's the best way to reallocate a memory block while preserving alignment? Ideally this would be something more intelligent than calling malloc(), copying, and then calling free() on the old block. I'd like to do it in place where possible.

dsimcha
  • 67,514
  • 53
  • 213
  • 334
  • Regarding #3, if you're using `realloc` correctly, it will almost always invoke `malloc`-and-`memcpy`, so don't worry about trying to find a solution to this. – R.. GitHub STOP HELPING ICE Feb 21 '11 at 01:47
  • 1
    @R, realloc would be very bad if it didn't first try to expand the current block into free heap. Only if that is impossible should it do the inefficient malloc/copy. – paxdiablo Feb 21 '11 at 01:56
  • 2
    "Calling free() on pointers to the middle of the block seems to work in practice on Windows" -- I doubt it. – Jim Balter Feb 21 '11 at 02:50
  • 3
    @Jim, maybe the crashing caused by that is just lost in the general noise of Windows crashing everywhere else :-) [[pax ducks for cover under the onslaught of offended Windows bods]]. – paxdiablo Feb 21 '11 at 03:26
  • @Jim: This is based only on a very quick test program I wrote. I find it amazing that I've learned enough about memory management to ask a question like this without ever running into that particular issue (about calling free() on pointers to the middle of a block) before. – dsimcha Feb 21 '11 at 15:43
  • @paxdiablo: It would be even worse if it couldn't handle the shrinking case without moving the block. I find it absurd that the Standard doesn't have a realloc variation which would be limited to shrinking blocks, but would not affect the validity of any existing pointers to the block or the contents thereof. *Any* implementation should be able to guarantee such functionality in all cases, since an implementation would be free to have such a function do nothing in any case where it couldn't do anything useful. – supercat Mar 16 '17 at 22:17
  • Possible duplicate of [aligned malloc() in GCC?](http://stackoverflow.com/questions/3839922/aligned-malloc-in-gcc) – Ciro Santilli OurBigBook.com Mar 28 '17 at 07:17

7 Answers7

20
  1. If your implementation has a standard data type that needs 16-byte alignment (long long for example), malloc already guarantees that your returned blocks will be aligned correctly. Section 7.20.3 of C99 states The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object.

  2. You have to pass back the exact same address into free as you were given by malloc. No exceptions. So yes, you need to keep the original copy.

  3. See (1) above if you already have a 16-byte-alignment-required type.

Beyond that, you may well find that your malloc implementation gives you 16-byte-aligned addresses anyway for efficiency although it's not guaranteed by the standard. If you require it, you can always implement your own allocator.

Myself, I'd implement a malloc16 layer on top of malloc that would use the following structure:

some padding for alignment (0-15 bytes)
size of padding (1 byte)
16-byte-aligned area

Then have your malloc16() function call malloc to get a block 16 bytes larger than requested, figure out where the aligned area should be, put the padding length just before that and return the address of the aligned area.

For free16, you would simply look at the byte before the address given to get the padding length, work out the actual address of the malloc'ed block from that, and pass that to free.

This is untested but should be a good start:

void *malloc16 (size_t s) {
    unsigned char *p;
    unsigned char *porig = malloc (s + 0x10);   // allocate extra
    if (porig == NULL) return NULL;             // catch out of memory
    p = (porig + 16) & (~0xf);                  // insert padding
    *(p-1) = p - porig;                         // store padding size
    return p;
}

void free16(void *p) {
    unsigned char *porig = p;                   // work out original
    porig = porig - *(porig-1);                 // by subtracting padding
    free (porig);                               // then free that
}

The magic line in the malloc16 is p = (porig + 16) & (~0xf); which adds 16 to the address then sets the lower 4 bits to 0, in effect bringing it back to the next lowest alignment point (the +16 guarantees it is past the actual start of the maloc'ed block).

Now, I don't claim that the code above is anything but kludgey. You would have to test it in the platforms of interest to see if it's workable. Its main advantage is that it abstracts away the ugly bit so that you never have to worry about it.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 2
    From man posix_memalign on Linux: "GNU libc malloc() always returns 8-byte aligned memory addresses". Re 7.20.3 - alignment for any pointer doesn't mean it has to be 16 bytes. – Tony Delroy Feb 21 '11 at 01:31
  • 2
    @Tony, if you have a 16-byte object with 16-byte alignment requirement, malloc is _required_ to return you an address that satisfies that. It's not aligning for a pointer, it's aligning for an object that can be pointed at. – paxdiablo Feb 21 '11 at 01:41
  • 2
    @paxdiablo: The C language's requirement that `malloc` return a suitably aligned pointer only applies for types that exist within the scope of the C language. If OP is writing asm using SSE instructions, for instance, larger alignment may be required, and of course it is not the C implementation's responsibility to provide it. Also it's possible that OP wants pointers that fit in (e.g.) 28 bits to pack other data compactly along with pointers. :-) – R.. GitHub STOP HELPING ICE Feb 21 '11 at 01:45
  • @paxdiablo: That requirement does not apply to types such as `__m128`, since the use of such a type is formally undefined behaviour, which absolves `malloc()` of all its responsibilities. – caf Feb 21 '11 at 01:47
  • Sorry, @R, I see the misunderstanding now. Fixed the first point to state "if your implementation ..." – paxdiablo Feb 21 '11 at 01:48
  • 1
    @caf, then it's the use of __m128 that introduces UB. If the type requiring 16-byte alignment is not a UB one (such as a perfectly valid 128-bit long long), then malloc must honour the alignment. That's the point I was trying to get across: badly, by the looks of it :-) – paxdiablo Feb 21 '11 at 01:54
  • @paxdiablo: Right, but if you do want to use `__m128` then you will need an allocation function that can return correctly-aligned blocks, and since `malloc()` cannot be relied upon for this, another method is needed, which is what the question is about. – caf Feb 21 '11 at 01:58
  • For true portability the above code doesn't quite work. Suppose that the implementation has *no* alignment requirements, and hence the implementation's `malloc` can return a value congruent to 15 mod 16. Then you only add 1 when you adjust it, and there isn't space to write a pointer value. For portability you need to allocate more space, and then for *efficiency* you want to cut that space down a bit based on knowledge of the alignment actually provided by `malloc`, perhaps as a `#define` that can optionally be provided by porters. – Steve Jessop Feb 21 '11 at 01:58
  • @caf, and that would be what the _rest_ of my voluminous essay is about :-) – paxdiablo Feb 21 '11 at 01:59
  • @Steve, you may be viewing the original answer where I did indeed store a pointer. Not so now - I store a single byte padding length and the addition of 16 before &~0xf guarantees there will be 1-to-16 bytes before the actual data area. – paxdiablo Feb 21 '11 at 02:01
  • Nonetheless, I think the point that `malloc()` is required to return 16-byte aligned memory if `long long` requires 16 byte alignment, but *not* if `__m128` does, is subtle and was worth a comment. – caf Feb 21 '11 at 02:01
  • @caf, paxdiablo: in practice the two most common reasons to want 16-aligned memory are: (a) to use it for SSE types, and (b) because you're a dynamic linker, and the image you're loading calls for specific-alignment, probably because it wants to align function entries, basic blocks or other code on cache lines. In neither case is there any particular reason to expect `malloc` to provide 16-alignment, and in neither case do you genuinely need a portable solution, because you're doing something inherently non-portable anyway. But it's nice to *have* a portable default implementation in the box. – Steve Jessop Feb 21 '11 at 02:01
  • @pax: I could take the benefit of the doubt but actually, I think I just misread the code. – Steve Jessop Feb 21 '11 at 02:03
  • By the way, your approach is good, but it needs cleaning up: binary `&` doesn't apply to pointer operands, and `s + 0x10` may wrap. – caf Feb 21 '11 at 02:04
  • No probs, @caf, added that to point 1. And yes, it's butt-ugly code :-) Which is why I said it was just a start. It should probably undergo quite a bit of testing and cleanup before becoming production ready. Hopefully the concept will survive that, though. – paxdiablo Feb 21 '11 at 02:06
  • +1, If I understand it correctly, does this mean that for the worst case, we will waste `size-of-alignment` in memory to satisfy alignment? – haxpor Feb 10 '19 at 12:31
1

Starting a C11, you have void *aligned_alloc( size_t alignment, size_t size ); primitives, where the parameters are:

alignment - specifies the alignment. Must be a valid alignment supported by the implementation. size - number of bytes to allocate. An integral multiple of alignment

Return value

On success, returns the pointer to the beginning of newly allocated memory. The returned pointer must be deallocated with free() or realloc().

On failure, returns a null pointer.

Example:

#include <stdio.h>
#include <stdlib.h>


    int main(void)
    {
        int *p1 = malloc(10*sizeof *p1);
        printf("default-aligned addr:   %p\n", (void*)p1);
        free(p1);

        int *p2 = aligned_alloc(1024, 1024*sizeof *p2);
        printf("1024-byte aligned addr: %p\n", (void*)p2);
        free(p2);
    }

Possible output:

default-aligned addr:   0x1e40c20
1024-byte aligned addr: 0x1e41000
Breno Leitão
  • 3,487
  • 2
  • 19
  • 23
1
  1. I'm not aware of any way of requesting malloc return memory with stricter alignment than usual. As for "usual" on Linux, from man posix_memalign (which you can use instead of malloc() to get more strictly aligned memory if you like):

    GNU libc malloc() always returns 8-byte aligned memory addresses, so these routines are only needed if you require larger alignment values.

  2. You must free() memory using the same pointer returned by malloc(), posix_memalign() or realloc().

  3. Use realloc() as usual, including sufficient extra space so if a new address is returned that isn't already aligned you can memmove() it slightly to align it. Nasty, but best I can think of.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • 1
    Using `memmove` to realign it is worse than just allocating new memory to begin with. In most real-world cases you'll invoke **two** copy operations. – R.. GitHub STOP HELPING ICE Feb 21 '11 at 01:46
  • R: a good caution - it would certainly be worthwhile generating some system-specific statistics to see how often realloc happened in place to inform the choice. – Tony Delroy Feb 21 '11 at 02:13
1

You could write your own slab allocator to handle your objects, it could allocate pages at a time using mmap, maintain a cache of recently-freed addresses for fast allocations, handle all your alignment for you, and give you the flexibility to move/grow objects exactly as you need. malloc is quite good for general-purpose allocations, but if you know your data layout and allocation needs, you can design a system to hit those requirements exactly.

sarnold
  • 102,305
  • 22
  • 181
  • 238
1

The trickiest requirement is obviously the third one, since any malloc() / realloc() based solution is hostage to realloc() moving the block to a different alignment.

On Linux, you could use anonymous mappings created with mmap() instead of malloc(). Addresses returned by mmap() are by necessity page-aligned, and the mapping can be extended with mremap().

caf
  • 233,326
  • 40
  • 323
  • 462
0
  1. Experiment on your system. On many systems (especially 64-bit ones), you get 16-byte aligned memory out of malloc() anyway. If not, you will have to allocate the extra space and move the pointer (by at most 8 bytes on almost every machine).

    For example, 64-bit Linux on x86/64 has a 16-byte long double, which is 16-byte aligned - so all memory allocations are 16-byte aligned anyway. However, with a 32-bit program, sizeof(long double) is 8 and memory allocations are only 8-byte aligned.

  2. Yes - you can only free() the pointer returned by malloc(). Anything else is a recipe for disaster.

  3. If your system does 16-byte aligned allocations, there isn't a problem. If it doesn't, then you'll need your own reallocator, which does a 16-byte aligned allocation and then copies the data - or that uses the system realloc() and adjusts the realigned data when necessary.

Double check the manual page for your malloc(); there may be options and mechanisms to tweak it so it behaves as you want.

On MacOS X, there is posix_memalign() and valloc() (which gives a page-aligned allocation), and there is a whole series of 'zoned malloc' functions identified by man malloc_zoned_malloc and the header is <malloc/malloc.h>.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    "Experiment on your system". Interesting fact that I tripped over once on Windows, many years ago. If your allocator is in fact a sub-allocator, then it may appear in simple experiments to return only 16-aligned values. It maintains 16-alignment, and the *first* allocation back from Windows in a process was always 16-aligned, for reasons to do with virtual memory. The *second* allocation might not be, so once you've used up the sub-allocator's first block, you have a 50% chance of less-aligned allocations. Took nearly a week to debug that in someone else's code in my first year as a pro. – Steve Jessop Feb 21 '11 at 02:17
  • 1
    ... the "someone else" had left the company, and the code in question was shifting pointers down a few bits to free up space at the top for flags (which is not quite as evil as it sounds since the whole system was by necessity ludicrously optimized for memory use and the code in question was a bignum library). But the code contained no comments, which was as evil as it sounds. – Steve Jessop Feb 21 '11 at 02:25
-1

You might be able to jimmy (in Microsoft VC++ and maybe other compilers):

#pragma pack(16)

such that malloc( ) is forced to return a 16-byte-aligned pointer. Something along the lines of:

ptr_16byte = malloc( 10 * sizeof( my_16byte_aligned_struct ));

If it worked at all for malloc( ), I'd think it would work for realloc( ) just as well.

Just a thought.

-- pete

Pete Wilson
  • 8,610
  • 6
  • 39
  • 51