0

I was just wondering what were the differences between the two of these two intrinsic functions. The Intel Intrinsic Guide doesn't help much:

  • _mm_storeu_si128: Store 128-bits of integer data from a into memory. mem_addr does not need to be aligned on any particular boundary.
  • _mm_loadu_si128: Load 128-bits of integer data from memory into dst. mem_addr does not need to be aligned on any particular boundary.

All the difference is on the word store or load but the difference is not clear to me.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Tom Clabault
  • 481
  • 4
  • 18
  • 2
    `Store` is used to move data from a register to a memory. So here, you move 128bits from a register to memory. `Load` is the opposite, you move 128bits from memory to your register. – Paul Ankman Jul 10 '18 at 13:21
  • A register is a variable like __m128i / __m64 / __m512 /... and memory is unsigned char, unsigned int etc ? So if I want to fill a __m128i structure with my 16 unsigned chars, I'll use _mm_loadu_si128 and the opposite is _mm_storeu_si128, it fills an unsigned char array with the _m128i structure. Am I right ? – Tom Clabault Jul 10 '18 at 16:00

1 Answers1

5

In C terms:

  • load = read data pointed to by a pointer.

  • store = write through a pointer.

For a simple type like int, aligned load and store functions could look like this:

int load(int *p) { return *p; }

void store(int *p, int val) { *p = val; }

(You'd actually use memcpy to get unaligned and strict-aliasing-safe loads and stores.)

__m128i load/store functions mostly exist to communicate aligned vs. unaligned to the compiler, vs. dereferencing __m128i* directly. And for float / double, they also avoid casts because _mm_loadu_ps takes a const float* arg.

Later Intel intrinsics take void* args, avoiding the need for a _mm_loadu_si128((const __m128i*)&my_struct) or whatever, but unfortunately they didn't make that improvement until AVX-512 intrinsics.


In asm terms, a load reads data from memory into a register (or as a source operand for an ALU instruction). A store writes data to memory.

C local variables are normally kept in registers, but of course your compiler is free to optimize intrinsic loads/stores the same way it can optimize dereferences of an int *. e.g. it might optimize away a store/reload so the asm wouldn't contain an instruction to do that.

The fact that there are load and store intrinsics does not mean that __m128i "is a register". It's like int; if/when it can be kept in a register, the compiler will do so, but you can make an array of __m128i or whatever. load/store intrinsics can be optimized away, or a load can be folded into a memory source operand for an ALU instruction like vpaddb.


Related:

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847