explanation to aligned malloc implementation

Question

This is not homework, this is purely for my own personal education.

I couldn't figure out how to implement an aligned malloc so looked online and found this website. For the ease of reading I will post the code below:

#include <stdlib.h>
#include <stdio.h>

void* aligned_malloc(size_t required_bytes, size_t alignment)
{
    void* p1; // original block
    void** p2; // aligned block
    int offset = alignment - 1 + sizeof(void*);
    if ((p1 = (void*)malloc(required_bytes + offset)) == NULL)
    {
       return NULL;
    }
    p2 = (void**)(((size_t)(p1) + offset) & ~(alignment - 1));
    p2[-1] = p1;
    return p2;
}

void aligned_free(void *p)
{
    free(((void**)p)[-1]);
}

void main (int argc, char *argv[])
{
    char **endptr;
    int *p = aligned_malloc (100, strtol(argv[1], endptr, 10));

    printf ("%s: %p\n", argv[1], p);
    aligned_free (p);
}

The implementation does work, but I honestly can't figure out how it works.

Here is what I can't understand:

Why we need an offset?
What does anding with ~(alignment - 1) accomplish
p2 is a double pointer. How come we can return it from a function that is supposed to return only a single pointer?
What is the general approach to solve this problem?

Any help is really appreciated.

EDIT

This is not a duplicate of How to allocate aligned memory only using the standard library? because I also need to know how to free aligned memory.

This only works if `aligned` is a power of 2 and it assumes your alignment is at least as large as required for `void*`. — Paul Hankin, Jun 29 '16 at 01:51
Also: `size_t` (in the line that sets `p2`) should be `uintptr_t`. There's no guarantee that `size_t` is large enough to represent pointer values. — Paul Hankin, Jun 29 '16 at 02:01
Possible duplicate of [How to allocate aligned memory only using the standard library?](http://stackoverflow.com/questions/227897/how-to-allocate-aligned-memory-only-using-the-standard-library) — Daniel Rudy, Jun 29 '16 at 03:15
@Daniel Rudy Proposed duplicate does well answer how to _allocate_ aligned memory. It does not address nor answer how to free that memory like this code attempts to do. In the proposed dupe, free'ing is done with the original pointer and its storage is not detailed. Here, code attempts to save/recover the original pointer in the allocated block. — chux - Reinstate Monica, Jun 29 '16 at 13:51
@PaulHankin In your first comment you said: `it assumes your alignment is at least as large as required for void*`. I'm not sure I understand this statement. Can you elaborate more? — flashburn, Jun 29 '16 at 18:11
@flashburn The line `p2[-1] = p1` requires that `p2` is aligned to whatever value is required to write `void*` values through it. But looking at the code again, I think I was mistaken that this alignment breaks if `aligned` is less than this value: `p2` always ends up with the same alignment guarantees as `malloc`. — Paul Hankin, Jun 29 '16 at 23:41
[This](https://embeddedartistry.com/blog/2017/02/22/generating-aligned-memory/) is a very well written tutorial about generating aligned memory. — vmemmap, Jun 27 '22 at 11:34

John Zwinck · Answer 1 · 2023-02-18T14:12:16.760

15

You need an offset if you want to support alignments beyond what your system's malloc() does. For example if your system malloc() aligns to 8 byte boundaries, and you want to align to 16 bytes, you ask for 8 bytes extra so you know for sure you can shift the result around to align it as requested. You also add sizeof(void*) to the size you pass to malloc() to leave room for bookkeeping.
~(alignment - 1) is what guarantees the alignment. For example if alignment is 16, then subtract 1 to get 15, aka 0xF, then negating it makes 0xFF..FF0 which is the mask you need to satisfy the alignment for any returned pointer from malloc(). Note that this trick assumes alignment is a power of 2 (which practically it normally would be, but there really should be a check).
It's a void**. The function returns void*. This is OK because a pointer to void is "A pointer to any type," and in this case that type is void*. In other words, converting void* to and from other pointer types is allowed, and a double-pointer is still a pointer.
The overall scheme here is to store the original pointer before the one that's returned to the caller. Some implementations of standard malloc() do the same thing: stash bookkeeping information before the returned block. This makes it easy to know how much space to reclaim when free() is called.

All that said, this sort of thing is usually not useful, because the standard malloc() returns the largest alignment on the system. If you need alignment beyond that, there may be other solutions, including compiler-specific attributes.

edited Feb 18 '23 at 14:12

answered Jun 29 '16 at 01:50

John Zwinck

239,568
38
324
436

1

It can be useful: aligning data with cache lines, and preparing data for weird hardware (eg: some specialist graphics hardware) are two that I've seen in the real world. – Paul Hankin Jun 29 '16 at 01:56
3

You might note in 2 that `alignment` must be a power of 2. Personally, I'd just use `%` rather than bit-twiddling here -- `malloc` is already relatively expensive and an extra divide isn't going to make any performance difference. – Paul Hankin Jun 29 '16 at 01:58
`and you want to align to 16 bytes, you ask for 15 bytes extra` I'm having trouble understanding this part. My brain seems to think I need extra 16 (not 15) to align to 16 bytes – KcFnMi Feb 18 '23 at 13:52
@KcFnMi: You actually only need to ask for 8 bytes extra, because there are only two cases if your malloc() guarantees 8 byte alignment. Case 1 is that you get an address like 24 which is 8 byte aligned but needs to be shifted to 32 for 16 byte alignment. Case 2 is you get an address like 48 which is both 8 and 16 byte aligned. Note that 50% of all possible addresses fall into each case. I'll edit the answer slightly. – John Zwinck Feb 18 '23 at 14:11
Well, anyway, I still stuck on why 15 instead of 16. Considering https://github.com/Avinash987/Coding/blob/master/Cracking-the-Coding-Interview-6th-Edition-189-Programming-Questions-and-Solutions.pdf, problem 12.10 on page 420. Let's suppose malloc returns 0 (the 1st position in memory) and we add 15 positions in memory, then we'll be at 15 which is not divisible by 16. – KcFnMi Feb 18 '23 at 14:21
@KcFnMi: You don't add 15 (or 8) to the *result address* from malloc, you add to the *input size* you pass to malloc. If malloc returns 3 (not actually possible, but just for example), and you want 16 byte alignment, you need to add 13 to the address malloc gave you, and so you need to have added at least 13 to the size you requested, otherwise the end of your data won't fit after adding 13 to the beginning address malloc gave you. – John Zwinck Feb 19 '23 at 08:17
"You don't add 15 (or 8) to the result address from malloc, you add to the input size you pass to malloc" yes, agree. And also agree to "If malloc returns 3 ..., and you want 16 byte alignment, you need to add 13". So, since we don't know beforehand the address malloc will return, we add extra_space. The extra_space should be equal to the alignment? Not equal to alignment -1? If malloc returns 0 ..., and you want 16 byte alignment, you need to add 16 – KcFnMi Feb 19 '23 at 08:29
@KcFnMi: If you need 16 byte alignment, you never need to add 16 to the address returned by malloc, because adding 16 would mean the original address was already aligned to 16 bytes. For example if you want 16 byte alignment and malloc returns 17, you need to add 15 to get address 32. But if malloc returns 16, you need to add zero. – John Zwinck Feb 19 '23 at 08:31
For example if you want 16 byte alignment and malloc returns 0, you need to add ? 16? I mean we need to ask a least the same (extra) as the alignment we want, isn't? And adjust after malloc returns. – KcFnMi Feb 19 '23 at 08:35
0 is already 16 byte aligned. Meaning `0 % 16` is zero. – John Zwinck Feb 19 '23 at 08:40
Oh, I see. That's why we ask extra alignment - 1, ok. Got it. – KcFnMi Feb 19 '23 at 08:42

elhadi dp ıpɐɥןǝ · Answer 2 · 2019-12-17T13:39:07.217

Suppose we need SZ bytes of aligned memory, let:

A is the alignment.
W is the CPU word size.
P is the memory returned by malloc.
SZ is the requested number of bytes to be allocated.

we will return (P + Y) in which (P + Y) mod A = 0

So, we should save the original pointer P to be able to free the memory later. In this case, we should allocate (SZ + W) bytes, but to let memory aligned, we will substruct Z bytes in which (P % A = Z) => (Z ∈ [0, A-1])

So the total memory to be allocated is:  SZ + W + MAX(Z) = SZ + W + A - 1

The pointer to be returned is P + Y = P + W + MAX(Z) - (P + W + MAX(Z)) mod A

WE HAVE: X - X mod A = INT(X / A) * A = X & ~(A - 1)

SO we can replace P + W + MAX(Z) - (P + W + MAX(Z)) mod A by (P + W + MAX(Z)) & ~(A - 1)

The memory to be returned is: (P + W + MAX(Z)) & ~(A - 1) = (P + W + A - 1) & ~(A - 1)

score 2 · Answer 3 · answered Jun 29 '16 at 02:00

implementation does work

Perhaps, but I wouldn't be too sure. IMO you'd be better off working from first principles. Right off the bat,

p1 = (void*)malloc

is a red flag. malloc returns void. In C, any pointer can be assigned from void *. Casting from malloc is usually considered bad form because any effect it has can only be bad.

Why we need an offset

The offset provides room to stash the pointer returned by malloc, used later by free.

p1 is retrieved from malloc. Later, it has to be provide to free to be released. aligned_malloc reserves sizeof(void*) bytes at p1, stashes p1 there, and returns p2 (the first "aligned" address in the block that p1 points to). Later, when the caller passes p2 to aligned_free, it converts p2 in effect to void *p2[], and fetches the original p1 using -1 as an index.

What does anding with ~(alignment - 1) accomplish

It's what puts p2 on the boundary. Say alignment is 16; alignment -1 is 15, 0xF. ~OxF is all bits on except the last 4. For any pointer P, P & ~0xF will be a multiple of 16.

p2 is a double pointer.

pointer schmointer. malloc returns void*. It's a block of memory; you address it as you will. You wouldn't blink at

char **args = calloc(7, sizeof(char*));

to allocate an array of 7 char * pointers, would you? The code picks some "aligned" location at least sizeof(void*) bytes from p1 and, for the purposes of free, treats it as void **.

What is the general approach

There is no one answer. Best is probably to use a standard (or popular) library. If you build atop malloc, allocating enough to keep the "real" pointer and returning an aligned one is pretty standard, although I would code it differently. The syscall mmap returns a page-aligned pointer, which will satisfy most criteria for "aligned". Depending on need, that might be better or worse than piggybacking on malloc.

@chux Do you mind elaborating more on why void** needs to be aligned. Why for this case it needs to be aligned? I think a concrete example might help. — flashburn, Jun 29 '16 at 18:15
@James K. Lowden I'm still unclear why returning a double pointer from a function that should return only a single pointer is not causing an error. Do you mind elaborating more on this. It looks like I'm missing something crucial about C but I can't quite understand what it is. — flashburn, Jun 29 '16 at 18:18
@chux I don't even get it if it is aligned for `char **` Can you elaborate why it needs to be aligned for `char **` — flashburn, Jun 29 '16 at 18:52
@chux `That value must be aligned for a char *`. Can you explain why `char *` should be aligned? Why not `int *` or just `int` or why not `short *`, why specifically `char *`? — flashburn, Jun 29 '16 at 19:45
@chux to move a discussion to chat we need to be prompted to do that. I'm not sure how to do it without it. I guess we could post more messages and then simply delete them. — flashburn, Jun 29 '16 at 19:48
@flashburn Chat in http://chat.stackoverflow.com/rooms/116018/alignment-requirements-in-c — chux - Reinstate Monica, Jun 29 '16 at 20:49

score 1 · Answer 4 · edited May 23 '17 at 12:16

I have a few issues with this code. I have compiled them into the below list:

p1 = (void*)malloc You do not cast the return value of malloc.
free(((void**)p)[-1]); You do not cast free.
if ((p1 = (void*)malloc(required_bytes + offset)) == NULL) Do not put an assignment inside the comparison of an if statement. I know a lot of people do this, but in my mind, that is just bad form and makes the code more difficult to read.

What they are doing here is storing the original pointer inside the allocated block. That means that only the aligned pointer gets returned to the user. The actual pointer that is returned by malloc, the user never sees. You have to keep that pointer though because free needs it to unlink the block from the allocated list and put it on the free list. At the head of every memory block, malloc puts some housekeeping information there. Things such and next/prev pointers, size, allocation status, etc.... Some debug versions of malloc use guard words to check if something overflowed the buffer. The alignment that is passed to the routine MUST be a power of 2.

When I wrote my own version of malloc for use in a pooled memory allocator, the minimum block size that I used was 8 bytes. So including the header for a 32-bit system, the total was 28 bytes (20 bytes for the header). On a 64-bit system, it was 40 bytes (32 bytes for the header). Most systems have increased performance when data is aligned to some address value (either 4 or 8 bytes on modern computer systems). The reason for this is because the machine can grab the entire word in one bus cycle if it is aligned. If not, then it requires two bus cycles to get the entire word, then it has to construct it. This is why compilers align variables on either 4 or 8 bytes. This means that the last 2 or 3 bits of the address bus are zero.

I know that there are some hardware constraints which requires more alignment than the default 4 or 8. Nvidia's CUDA system, if I remember correctly, requires things aligned to 256 bytes...and that's a hardware requirement.

This has been asked before though. See: How to allocate aligned memory only using the standard library?

Hope this helps.

Code uses `free(((void**)p)[-1]);` to find the original pointer. If "do not cast free" is followed, how would you code `aligned_free()`? — chux - Reinstate Monica, Jun 29 '16 at 13:41
@chux-ReinstateMonica The way that I have done it is subtract the header size from the pointer to repoint it to the house keeping data block. Then you can switch it to the free list using normal list routines. — Daniel Rudy, Jan 06 '20 at 19:40
How did code "subtract the header size from the pointer" without a cast? Subtracting from `void *` is UB. — chux - Reinstate Monica, Jan 06 '20 at 19:54

explanation to aligned malloc implementation

4 Answers4

Linked