7

I try to build an application which uses pthreads and __m128 SSE type. According to GCC manual, default stack alignment is 16 bytes. In order to use __m128, the requirement is the 16-byte alignment.

My target CPU supports SSE. I use a GCC compiler which doesn't support runtime stack realignment (e.g. -mstackrealign). I cannot use any other GCC compiler version.

My test application looks like:

#include <xmmintrin.h>
#include <pthread.h>
void *f(void *x){
   __m128 y;
   ...
}
int main(void){
  pthread_t p;
  pthread_create(&p, NULL, f, NULL);
}

The application generates an exception and exits. After a simple debugging (printf "%p", &y), I found that the variable y is not 16-byte aligned.

My question is: how can I realign the stack properly (16-byte) without using any GCC flags and attributes (they don't help)? Should I use GCC inline Assembler within this thread function f()?

psihodelia
  • 29,566
  • 35
  • 108
  • 157
  • 2
    If you must use a particular gcc version, please include the gcc version (e.g. gcc 4.3.2 i386), and host/target OS (e.g. Debian 5.0 (lenny) Linux 2.6.26 i686). Knowing whether to suggest gcc 4.3 options versus 3.4 may make a difference. – mctylr May 04 '10 at 14:17

5 Answers5

8

Allocate on the stack an array that is 15-bytes larger than sizeof(__m128), and use the first aligned address in that array. If you need several, allocate them in an array with a single 15-byte margin for alignment.

I do not remember if allocating an unsigned char array makes you safe from strict aliasing optimizations by the compiler or if it only works only the other way round.

#include <stdint.h>

void *f(void *x)
{
   unsigned char y[sizeof(__m128)+15];
   __m128 *py = (__m128*) (((uintptr_t)&y) + 15) & ~(uintptr_t)15);
   ...
}
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • You also might want to examine whether the overall thread stack is being allocated with a 16-byte alignment. – Donal Fellows May 04 '10 at 13:08
  • Thanks, but what is ptr_t and why do you use &~15 ? – psihodelia May 04 '10 at 13:20
  • 7
    Unfortunately this forces the variable to be on the stack regardless of potential compiler optimisations (like keeping it in a register). – Paul R May 04 '10 at 13:27
  • I'm guess it's meant to be `uintptr_t`, but either way it's just an integer type that's big enough to hold a pointer. – Paul R May 04 '10 at 13:43
  • @Paul R Right, I was looking for the right header file and I couldn't find it because I was misremembering the name. @psihodelia `&~15` means "round down to the multiple of 16 immediately inferior". – Pascal Cuoq May 04 '10 at 14:05
  • It doesn't work for me, because I have a lot of nested functions and local variables. – psihodelia May 04 '10 at 16:01
3

This shouldn't be happening in the first place, but to work around the problem you can try:

void *f(void *x)
{
   __m128 y __attribute__ ((aligned (16)));
   ...
}
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • No, it doesn't help. The same problem. – psihodelia May 04 '10 at 12:51
  • My guess is you're doing this on Windows rather than a proper operating system ? There is some good info here on working around this problem: http://www.sourceware.org/ml/pthreads-win32/2008/msg00056.html – Paul R May 04 '10 at 13:25
  • 3
    It looks like this is a bug in old versions of gcc - it seems to have been fixed around 2004 - is there some reason why you can't use a more up-to-date toolchain ? – Paul R May 04 '10 at 13:45
  • Actually no, I cannot use another GCC version - we have a specific hardware/software environment. – psihodelia May 04 '10 at 14:03
  • I am trying to implement explicit stack adjustment using inline assembler. – psihodelia May 04 '10 at 14:04
3

Sorry to resurrect an old thread...

For those with a newer compiler than OP, OP mentions a -mstackrealign option, which lead me to __attribute__((force_align_arg_pointer)). If your function is being optimized to use SSE, but %ebp is misaligned, this will do the runtime fixes if required for you, transparently. I also found out that this is only an issue on i386. The x86_64 ABI guarantees the arguments are aligned to 16 bytes.

__attribute__((force_align_arg_pointer)) void i_crash_when_not_aligned_to_16_bytes() { ... }

Cool article for those who might want to learn more: http://wiki.osdev.org/System_V_ABI

AStupidNoob
  • 1,980
  • 3
  • 23
  • 35
  • Thanks for this. It helped resolve a 32-bit x86 problem with [making `.so` files runnable as binaries](https://stackoverflow.com/a/68339111/14760867). It also helped me find the bug that discusses the [confusion about this stuff](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838#c91). – Andrew G Morgan Nov 14 '21 at 17:03
2

Another solution would be, to use a padding function, which first aligns the stack and then calls f. So instead of calling f directly, you call pad, which pads the stack first and then calls foowith an aligned stack.

The code would look like this:

#include <xmmintrin.h>
#include <pthread.h>

#define ALIGNMENT 16

void *f(void *x) {
    __m128 y;
    // other stuff
}

void * pad(void *val) {
    unsigned int x; // to get the current address from the stack
    unsigned char pad[ALIGNMENT - ((unsigned int) &x) % ALIGNMENT];
    return f(val);
}

int main(void){
    pthread_t p;
    pthread_create(&p, NULL, pad, NULL);
}
ablaeul
  • 2,750
  • 20
  • 22
-1

I have solved this problem. Here is my solution:

void another_function(){
   __m128 y;
   ...
}
void *f(void *x){
asm("pushl    %esp");
asm("subl    $16,%esp");
asm("andl    $-0x10,%esp");
another_function();
asm("popl %esp");
}

First, we increase the stack by 16 bytes. Second, we make least-significant nibble equal 0x0. We preserve the stack pointer using push/pop operands. We call another function, which has all its own local variables 16-byte aligned. All nested functions will also have their local variables 16-byte aligned.

And It works!

psihodelia
  • 29,566
  • 35
  • 108
  • 157
  • 6
    Seriously. UPDATE YOUR COMPILER. Don't be proud of yourself for putting rube goldberg devices in your code. – Frank Krueger May 04 '10 at 16:05
  • 8
    This code appears to save ESP on the stack, then move ESP somewhere else, then pop ESP. This will cause a random value to be popped into ESP. Doesn't this cause a crash? Or are you using a calling convention where ESP is saved somewhere else, perhaps into EBP, and restored at the end, making that POP superfluous? – user9876 May 04 '10 at 16:08
  • 1) I cannot update GCC -> I have a specific run-time environment and a specific x86-compatible CPU. 2) No, why can it cause a crash? Saving ESP, then restoring it does not cause any crash or a random value. I have tested the code above also without pushl/popl and it is also Ok. No any calling convention and ESP is not saved somewhere else. – psihodelia May 05 '10 at 09:43
  • 5
    Like user9876 said - do you know what "pushl %esp" does? Conceptually, it works like this: Memory[%esp] = %esp %esp -= 4; //depending on how your stack grows, it may be "+=4" then, a "popl %esp" essentially does: %esp += 4; %esp = Memory[%esp] Now, if between the "push" and "pop" you modified esp - the second memory access (the "pop") will read from a wrong address. The only reasonable explanation for why it works is that the compiler saves %esp somewhere else, too (e.g in ebp?) in the prologue of function f(), and then restores it in the epilogue of f(). Thus, it hides your error. – Virgil Jun 07 '11 at 11:09