3

I've encountered such a mysterious Segmentation fault.

#include <stdio.h>
#include <immintrin.h>
struct Box{
    __m256i L;
};
int main()
{
    struct Box *result=NULL;
    result=(struct Box *)malloc(sizeof(struct Box));
    (*result).L=(*result).L;
}

Compiled with flags -msse4.2 -march=corei7-avx

It runs totally fine on my Mac (OS X EI Caption 10.11.6, GCC 4.8.4). But it gives me Segmentation fault on Amazon EC2 machine (Ubuntu 14.04, GCC 4.8.4).

When I make a slight change:

#include <stdio.h>
#include <immintrin.h>
struct Box{
    __m256i L;
};
int main()
{
    struct Box result[1];
    (*result).L=(*result).L;
}

It will be able to run on the Ubuntu machine.

Does anyone have any explanation about this?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ruiyu Zhu
  • 71
  • 8

1 Answers1

4

Alignment, Malloc Behavior, and Stack allocations

@Peter Cordes's comment addresses part of the problem.

Specifically on your OSX machine you are "always" getting correctly aligned memory which meets the alignment needs for the assembly generated for the __m256i datatype, which is supposed to be 32 byte aligned. (I put "always" in quotations because it isn't guaranteed with malloc. You may have just gotten lucky with OSX's malloc. Multiple runs of the same code tend to get the same alignments, unlike repeated calls to malloc inside one program. Actually, see below: the compiler on OS X generates different asm)

On Ubuntu, you are not getting a suitable alignment in the memory address returned by malloc. (See below for details on why.)

You call your second code snippet a slight change.

int main()
{
    struct Box result[1];
    (*result).L=(*result).L;
}

It is actually quite substantially different from the first snippet which uses malloc because the the complier (gcc here) is aware of the alignment requirements of the data type Box (and by extension __m256i also) when allocating the memory on the stack. Thus there is no risk of segfaulting in this case because the complier provides the correct alignment.

You can manipulate the base pointer returned by malloc as explained in this post https://stackoverflow.com/a/227900/3516034. I'll let you look there for the details but in short you can do something like

struct Box *result=NULL;
void *mem = malloc(2 * sizeof(struct Box));
result = (struct Box *)((uintptr_t)mem + offset);

where offset allows you to explore the alignment and segfaults. It may be useful for you to print out the pointer address you end up using for result like printf("0x%08" PRIXPTR "\n", (uintptr_t)result); (again from that post).

Instruction Differences

Lastly, I can reproduce this on Ubuntu and OSX. I actually see my OSX malloc calls giving 16 byte alignment and not 32 byte alignment. I also see 16 byte alignment on Ubuntu (in a VM on the same hardware) which is causing the segfault. When I manually align to 32 bytes, the segfault goes away.

So the root cause of your problem is that the systems are not generating the same assembly instructions. I got the assembly with the -S option to gcc. On OSX, I see vmovaps used 4 times with XMM operands, and on Ubuntu vmovdqa twice with YMM operands, to move a __m256i.

vmovdqa and vmovaps require their memory operands to be naturally-aligned (i.e. 32B for YMM, 16B for XMM). So the assembly generated on OSX only requires 16B alignment even though __m256i is 32B.

Community
  • 1
  • 1
Phil
  • 1,226
  • 10
  • 20
  • I am pretty sure that the code ALWAYS run properly on my Mac. It looks like this is contradict your theory. According to your theory, It should also give a Segfault since it gives 16 byte alignment but not 32 byte alignment. – Ruiyu Zhu Oct 13 '16 at 16:32
  • It is more complicated than that. As I explain in the "instruction differences" section, gcc on your OSX is generating assembly instructions that only require _16 byte_ alignment _even though the C datatype `__m256i` requires 32 byte alignment. This is why it is working on your Mac - the resulting binary only needs 16 byte alignment, not 32 byte. The Ubuntu build generates instructions that require 32 byte alignment. – Phil Oct 13 '16 at 18:14
  • That explains everything. Thanks! – Ruiyu Zhu Oct 14 '16 at 23:20