1

I was reading some code in C++ and I read the following:

   CACHELINE = 64;
   ...

/* allocate the three matrices and align to cache lines */
    a = (double *)malloc(nmax*nmax*sizeof(double)+CACHELINE);
    b = (double *)malloc(nmax*nmax*sizeof(double)+CACHELINE);
    c = (double *)malloc(nmax*nmax*sizeof(double)+CACHELINE);
    a = (double *)(((unsigned long)a+CACHELINE)&~(CACHELINE-1));
    b = (double *)(((unsigned long)b+CACHELINE)&~(CACHELINE-1));
    c = (double *)(((unsigned long)c+CACHELINE)&~(CACHELINE-1));

Why does this code create matrices which are aligned with cache lines? I especially do not understand what this instruction does:

a = (double *)(((unsigned long)a+CACHELINE)&~(CACHELINE-1));

Thank you!

elena
  • 889
  • 2
  • 11
  • 19
  • 2
    Because of speed. – Hatted Rooster Jan 29 '18 at 16:50
  • @RickAstley I'll edit my question if that was not clear. I know that having data aligned with the cache lines gives better performances, what I don't understand is why that code produces matrices which are aligned with the cache line. Could you explain me? Thank you! – elena Jan 29 '18 at 16:53
  • Possible duplicate https://stackoverflow.com/questions/3928995/how-do-cache-lines-work – llllllllll Jan 29 '18 at 16:56

1 Answers1

2

It's pretty simple. malloc does not guarantee that the returned address will be aligned to cache line size. Therefore, you can allocate some additional memory (+CACHELINE) and start using it from the first byte that is properly aligned. This first byte is calculated in the lower assignment.

However, this is terrible piece of code. For instance, it looses information about originally allocated address, so you cannot then free it. Or, it casts pointers to unsigned long, which is not safe (there is uintptr_t).

There are other ways how to allocate aligned storage, such as posix_memalign.


Example: Consider you want to allocate 100 bytes of memory, but you allocate 100+64=164 instead. malloc returns the address 16, so you can use bytes from addresses 16 to 179.

Now, you need to calculate address of the first byte in this range aligned to 64, which is itself 64. This is calculated as (16+64)&~(64-1)=80&~63=64. So finally, you will use bytes from addresses 64 to 163, which is within allocated range 16 to 179.

Daniel Langr
  • 22,196
  • 3
  • 50
  • 93
  • Thank you! So the same code (written in a better way, taking into account the things you said) in C++ would be: `a = new[nmax*nmax*sizeof(double)+CACHELINE]` and then `a = (double *)(((uintptr_t)a+CACHELINE)&~(CACHELINE-1));` right? – elena Jan 29 '18 at 17:12
  • 1
    I would use a function that allocates aligned storage directly, such as mentioned `posix_memalign`. See, e.g., here for more https://stackoverflow.com/questions/6973995/dynamic-aligned-memory-allocation-in-c11. – Daniel Langr Jan 29 '18 at 17:13
  • Does it work also in C++? Because I have read (in the last 5secs) that it's a function belonging to C – elena Jan 29 '18 at 17:15
  • 1
    `posix_memalign` is not part of C++, but it will work on POSIX-based systems (Linux, etc.). On Windows, there is likely some other function. I don't know whether there is some portable way. There is `aligned_alloc` in C++17: http://en.cppreference.com/w/cpp/memory/c/aligned_alloc – Daniel Langr Jan 29 '18 at 17:17