1

The documentation for _alloca() says here:

The _alloca routine returns a void pointer to the allocated space, which is guaranteed to be suitably aligned for storage of any type of object.

However, here it says:

_alloca is required to be 16-byte aligned and additionally required to use a frame pointer.

So it seems that in the first reference they forgot about 32-byte aligned AVX/AVX2 types like __m256d.

Another thing that confuses me is that the first page says _alloca() is deprecated, while it suggests to use instead a function that may allocate memory from the heap rather than the stack (which is unacceptable in my multi-threaded application).

So can someone point me whether there is some modern (perhaps, new C/C++ standard?) way for aligned stack memory allocation?

Clarification 1: Please, don't provide solutions which require the array size to be compile-time constant. My function allocates variable number of array items depending on run-time parameter value.

Serge Rogatch
  • 13,865
  • 7
  • 86
  • 158
  • 1
    First, decide if you are asking about C or C++, though `_alloca` is not part of either of them. –  Oct 22 '17 at 20:42
  • `alloca` align allocation on 16byte. if you need another align - allocate more memory and align yourself – RbMm Oct 22 '17 at 20:44
  • Will `std::aligned_storage` work for your needs? You can specify the alignment as the second template parameter and it comes from the stack given the example implementation which uses `alignas`. http://en.cppreference.com/w/cpp/types/aligned_storage – Joe Oct 22 '17 at 20:45
  • What is `alignof(__m256d)`, for the benefit of people who don't have your platform extensions? – Kerrek SB Oct 22 '17 at 20:53
  • @KerrekSB, it was in the question: 32 bytes. – Serge Rogatch Oct 22 '17 at 20:54
  • @SergeRogatch: Yes, sure, but is that literally the value of the alignof expression, or something you got from somewhere else? – Kerrek SB Oct 22 '17 at 20:55
  • if you need allocate 32byte allign memory of size `cb` - `PBYTE pb = (PBYTE)alloca(31 + cb); pb = (PBYTE)((ULONG_PTR)(pb + 31) & ~31);` – RbMm Oct 22 '17 at 21:01
  • @SergeRogatch Are you sure the first reference is incorrect? They are perfectly consistent. Something guaranteed to fit any type can also be only guaranteed to align on a 16-byte boundary because there might exist machines that don't have any 32-byte aligned types. Obviously, on machines that do have types with 32-byte alignment requirement, it would be 32-byte aligned. – David Schwartz Oct 22 '17 at 21:03
  • @KerrekSB , I've double-checked with `std::cout << alignof(__m256d) << std::endl;`, it's 32. – Serge Rogatch Oct 22 '17 at 21:10
  • @Joe, no, `std::aligned_storage` doesn't seem eligible because it requires the array length to be compile-time constant. – Serge Rogatch Oct 22 '17 at 21:12
  • OK -- that's *extended alignment* then, and support for that is implementation-defined. `max_align_t` only gives you the largest non-extended alignment, so the phrase "for any type" is not quite accurate. You can use the `std::align` helper function to align memory manually. – Kerrek SB Oct 22 '17 at 21:25

4 Answers4

5

Overallocate with _alloca(), then hand-align. Like this:

const int align = 32;
void *p =_alloca(n + align - 1);
__m256d *pm = (__m256d *)((((int_ptr_t)p + align - 1) / align) * align);

Replace const with #define, if necessary.

Seva Alekseyev
  • 59,826
  • 25
  • 160
  • 281
2

_alloca() is certainly not a standard or portable way of handling alignment on the stack. Luckily in C++11 we got alignas and std::aligned_storage. Neither of these forces you to put anything on the heap, so they should work for your use case. For example, to align an array of structs to a 32 byte boundary:

#include <type_traits>

struct bar { int member; /*...*/ };
void fun() {
  std::aligned_storage<sizeof(bar), 32>::type array[16];
  auto bar_array = reinterpret_cast<bar*>(array);
}

Or if you just want to align a single variable on the stack to a boundary:

void bun() {
  alignas(32) bar b;
}

You can also use the alignof operator to get the alignment requirements for a given type.

Wijagels
  • 148
  • 1
  • 2
  • 12
1

C++11 introduced the alignof operator:

An alignof expression yields the alignment requirement of its operand type.

You can use it as follows:

struct s {};
typedef s __attribute__ ((aligned (64))) aligned_s;

std::cout << alignof(aligned_s); // Outputs: 64

Note: If your type's alignment is bigger than its size, the compiler won't let you declare arrays of the array type(See more here):

error: alignment of array elements is greater than element size

But, if your type's alignment is smaller then its size, you can safely allocate arrays:

aligned_s arr[32];
-- OR --
constexpr size_t arr_size = 32;
aligned_s arr[arr_size];

Compilers that support VLAs, will allow those for the newly defined type as well.

Daniel Trugman
  • 8,186
  • 20
  • 41
  • Does this approach allow non-constant array size? Array size changes at runtime between calls of the function where I need `_alloca()`. – Serge Rogatch Oct 22 '17 at 20:55
  • *cl* not support non-constant array size – RbMm Oct 22 '17 at 21:11
  • @SergeRogatch, Dynamic Arrays (a.k.a VLAs) were [considered as part of the standard](https://isocpp.org/blog/2013/04/trip-report-iso-c-spring-2013-meeting) but didn't make it. Though, G++ (4.6.3) and Clang (900.0.38) allow it. – Daniel Trugman Oct 22 '17 at 21:14
1

The "modern" way is:

Don't make variable-length allocation on the stack.

In the context of your question - wanting to allocate on the heap but refraining from doing so - I'm assuming you may be allocating more than some small compile-time constant amount of memory. In that case, you're simply going to smash your stack with that alloca() call. Instead, use a thread-safe memory allocator. I'm sure there are libraries for this on GitHub (and at worst you could protect allocation calls with a global mutex, although that's slow if you need lots of them).

On the other hand, if you do know in advance what's the cap on the allocation size - just pre-allocate that much memory in thread-local storage; or use a fixed-size local array (which will get allocated on the stack).

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • *Don't make variable-length allocation on the stack* - this is why ? – RbMm Oct 22 '17 at 22:16
  • really variable-length allocation is very effective on relative small blocks. if we do this in user mode and in own exe file (so we exactly know the stack size and can set it at build time). usually we free allocate hundreds of thousands bytes in stack. another question, when we do this first time and allocate several pages(4KB) or more this will be slowly compare heap allocation (if special not move guard page down before). and in case stack overflow behavior is defined. (*all this for windows*) – RbMm Oct 23 '17 at 07:24