72

I am working on a single producer single consumer ring buffer implementation.I have two requirements:

  1. Align a single heap allocated instance of a ring buffer to a cache line.
  2. Align a field within a ring buffer to a cache line (to prevent false sharing).

My class looks something like:

#define CACHE_LINE_SIZE 64  // To be used later.

template<typename T, uint64_t num_events>
class RingBuffer {  // This needs to be aligned to a cache line.
public:
  ....

private:
  std::atomic<int64_t> publisher_sequence_ ;
  int64_t cached_consumer_sequence_;
  T* events_;
  std::atomic<int64_t> consumer_sequence_;  // This needs to be aligned to a cache line.

};

Let me first tackle point 1 i.e. aligning a single heap allocated instance of the class. There are a few ways:

  1. Use the c++ 11 alignas(..) specifier:

    template<typename T, uint64_t num_events>
    class alignas(CACHE_LINE_SIZE) RingBuffer {
    public:
      ....
    
    private:
      // All the private fields.
    
    };
    
  2. Use posix_memalign(..) + placement new(..) without altering the class definition. This suffers from not being platform independent:

    void* buffer;
    if (posix_memalign(&buffer, 64, sizeof(processor::RingBuffer<int, kRingBufferSize>)) != 0) {
        perror("posix_memalign did not work!");
        abort();
    }
    // Use placement new on a cache aligned buffer.
    auto ring_buffer = new(buffer) processor::RingBuffer<int, kRingBufferSize>();
    
  3. Use the GCC/Clang extension __attribute__ ((aligned(#)))

    template<typename T, uint64_t num_events>
    class RingBuffer {
    public:
      ....
    
    private:
      // All the private fields.
    
    } __attribute__ ((aligned(CACHE_LINE_SIZE)));
    
  4. I tried to use the C++ 11 standardized aligned_alloc(..) function instead of posix_memalign(..) but GCC 4.8.1 on Ubuntu 12.04 could not find the definition in stdlib.h

Are all of these guaranteed to do the same thing? My goal is cache-line alignment so any method that has some limits on alignment (say double word) will not do. Platform independence which would point to using the standardized alignas(..) is a secondary goal.

I am not clear on whether alignas(..) and __attribute__((aligned(#))) have some limit which could be below the cache line on the machine. I can't reproduce this any more but while printing addresses I think I did not always get 64 byte aligned addresses with alignas(..). On the contrary posix_memalign(..) seemed to always work. Again I cannot reproduce this any more so maybe I was making a mistake.

The second aim is to align a field within a class/struct to a cache line. I am doing this to prevent false sharing. I have tried the following ways:

  1. Use the C++ 11 alignas(..) specifier:

    template<typename T, uint64_t num_events>
    class RingBuffer {  // This needs to be aligned to a cache line.
      public:
      ...
      private:
        std::atomic<int64_t> publisher_sequence_ ;
        int64_t cached_consumer_sequence_;
        T* events_;
        std::atomic<int64_t> consumer_sequence_ alignas(CACHE_LINE_SIZE);
    };
    
  2. Use the GCC/Clang extension __attribute__ ((aligned(#)))

    template<typename T, uint64_t num_events>
    class RingBuffer {  // This needs to be aligned to a cache line.
      public:
      ...
      private:
        std::atomic<int64_t> publisher_sequence_ ;
        int64_t cached_consumer_sequence_;
        T* events_;
        std::atomic<int64_t> consumer_sequence_ __attribute__ ((aligned (CACHE_LINE_SIZE)));
    };
    

Both these methods seem to align consumer_sequence to an address 64 bytes after the beginning of the object so whether consumer_sequence is cache aligned depends on whether the object itself is cache aligned. Here my question is - are there any better ways to do the same?

EDIT:

The reason aligned_alloc did not work on my machine was that I was on eglibc 2.15 (Ubuntu 12.04). It worked on a later version of eglibc.

From the man page: The function aligned_alloc() was added to glibc in version 2.16.

This makes it pretty useless for me since I cannot require such a recent version of eglibc/glibc.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Rajiv
  • 2,587
  • 2
  • 22
  • 33
  • 6
    great question, see Michael Spencer's [BoostCon 2013 talk](http://www.youtube.com/watch?v=uSZFrmhayIM). I don't think you can portably align to more than 16 bytes (so 64 byte cache line and even larger alignment to virtual memory pages is not supported by the Standard). – TemplateRex Dec 26 '13 at 22:04
  • @TemplateRex Thank you for the link. The talk seems relevant + 1. – Rajiv Dec 26 '13 at 22:41

4 Answers4

35

Unfortunately the best I have found is allocating extra space and then using the "aligned" part. So the RingBuffer new can request an extra 64 bytes and then return the first 64 byte aligned part of that. It wastes space but will give the alignment you need. You will likely need to set the memory before what is returned to the actual alloc address to unallocate it.

[Memory returned][ptr to start of memory][aligned memory][extra memory]

(assuming no inheritence from RingBuffer) something like:

void * RingBuffer::operator new(size_t request)
{
     static const size_t ptr_alloc = sizeof(void *);
     static const size_t align_size = 64;
     static const size_t request_size = sizeof(RingBuffer)+align_size;
     static const size_t needed = ptr_alloc+request_size;

     void * alloc = ::operator new(needed);
     void *ptr = std::align(align_size, sizeof(RingBuffer),
                          alloc+ptr_alloc, request_size);

     ((void **)ptr)[-1] = alloc; // save for delete calls to use
     return ptr;  
}

void RingBuffer::operator delete(void * ptr)
{
    if (ptr) // 0 is valid, but a noop, so prevent passing negative memory
    {
           void * alloc = ((void **)ptr)[-1];
           ::operator delete (alloc);
    }
}

For the second requirement of having a data member of RingBuffer also 64 byte aligned, for that if you know that the start of this is aligned, you can pad to force the alignment for data members.

Glenn Teitelbaum
  • 10,108
  • 3
  • 36
  • 80
  • This definitely seems like a more standard way of doing it, with the caveat that any alignment request over 16 bytes is not required by the standard. I'll accept it since this seems more portable than my posix_memalign(..) solution. – Rajiv Dec 27 '13 at 20:18
  • 1
    Your saving of `alloc` to use with `delete` should use `void*`, no? – Ben Voigt Dec 27 '13 at 20:40
  • 1
    "((void **)ptr)[-1] = alloc;" - isn't this compiler dependant? – Stefan Monov Dec 28 '17 at 20:47
  • @StefanMonov I'm not sure why it would be compiler dependent `ptr` points to at least `sizeof(void *)` bytes past `alloc`, `ptr[-1]` should still be >= `alloc` – Glenn Teitelbaum Dec 29 '17 at 18:42
  • @GlennTeitelbaum: Ah my bad, sorry :) – Stefan Monov Dec 29 '17 at 21:56
  • @GlennTeitelbaum Nitpicking: The ptr argument (3rd argument) of `std::align()` is a reference to a pointer. So doesn't this code pass in a reference to a temporary object (`alloc+ptr_alloc`)? i.e. it should be `void* ptr = alloc+ptr_alloc; ptr = std::align(align_size, sizeof(RingBuffer, ptr, request_size);` – user673679 Feb 22 '18 at 11:50
  • @user673679 Why do you think in this case a reference to a temporary object is an issue? The lifespan seems confined to inside the function call. – Glenn Teitelbaum Mar 06 '18 at 22:01
10

The answer to your problem is std::aligned_storage. It can be used top level and for individual members of a class.

rubenvb
  • 74,642
  • 33
  • 187
  • 332
4

After some more research my thoughts are:

  1. Like @TemplateRex pointed out there does not seem to be a standard way to align to more than 16 bytes. So even if we use the standardized alignas(..)there is no guarantee unless the alignment boundary is less than or equal to 16 bytes. I'll have to verify that it works as expected on a target platform.

  2. __attribute ((aligned(#))) or alignas(..) cannot be used to align a heap allocated object as I suspected i.e. new() doesn't do anything with these annotations. They seem to work for static objects or stack allocations with the caveats from (1).

    Either posix_memalign(..) (non standard) or aligned_alloc(..) (standardized but couldn't get it to work on GCC 4.8.1) + placement new(..) seems to be the solution. My solution for when I need platform independent code is compiler specific macros :)

  3. Alignment for struct/class fields seems to work with both __attribute ((aligned(#))) and alignas() as noted in the answer. Again I think the caveats from (1) about guarantees on alignment stand.

So my current solution is to use posix_memalign(..) + placement new(..) for aligning a heap allocated instance of my class since my target platform right now is Linux only. I am also using alignas(..) for aligning fields since it's standardized and at least works on Clang and GCC. I'll be happy to change it if a better answer comes along.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Rajiv
  • 2,587
  • 2
  • 22
  • 33
2

I don't know if it is the best way to align memory allocated with a new operator, but it is certainly very simple !

This is the way it is done in thread sanitizer pass in GCC 6.1.0

#define ALIGNED(x) __attribute__((aligned(x)))

static char myarray[sizeof(myClass)] ALIGNED(64) ;
var = new(myarray) myClass;

Well, in sanitizer_common/sanitizer_internal_defs.h, it is also written

// Please only use the ALIGNED macro before the type.
// Using ALIGNED after the variable declaration is not portable!        

So I do not know why the ALIGNED here is used after the variable declaration. But it is an other story.

Hugo
  • 138
  • 7