Intel / ARM intrinsics equivalence

Question

I have a C application using Intel intrinsics like:

__m128 _mm_add_ps (__m128 a, __m128 b)
__m128 _mm_sub_ps (__m128 a, __m128 b)
__m128 _mm_mul_ps (__m128 a, __m128 b)
__m128 _mm_set_ps (float e3, float e2, float e1, float e0)
void _mm_store_ps (float* mem_addr, __m128 a)
__m128 _mm_load_ps (float const* mem_addr)

Now, i am trying to modify my application in order to make it work on ARMv8 using a simulator called Gem5. So, i began to look around for ARM intrinsics and i found this manual ARM® NEON™ Intrinsics Reference

Well, i found the arithmetic intrinsics, but I'm a little bit lost with setting, storing and loading instructions.

Anyone with experience with ARM intrinsics could tell me the right intrinsics?

I know that ARM and x86 are different architectures of course, But certainly, there are certain logical similarities that make us port my application from x86 to ARM — A.nechi, Aug 12 '16 at 13:55
[A porting guide and header file to convert SSE intrinsics to their ARM NEON equivalent](http://codesuppository.blogspot.com.tr/2015/02/sse2neonh-porting-guide-and-header-file.html) — aebudak, Aug 12 '16 at 14:35
A link from the "related" sidebar that's worth pointing out specifically: http://stackoverflow.com/questions/2851421/is-there-a-good-reference-for-arm-neon-intrinsics — Peter Cordes, Aug 14 '16 at 01:17
The only thing that it's troubling me is the setting because i've made a macro like so: **#define SET_FLOAT32x4(dest, e3, e2, e1, e0){dest = { e3, e2, e1, e0}}** .But i keep getting the error **expected expression before"{"** — A.nechi, Aug 14 '16 at 12:29

Paul R · Accepted Answer · 2016-08-12T15:03:26.287

8

Here are a few equivalents to get you started:

SSE             ARM

__m128          float32x4_t     // 4 x 32 bits floats in a vector

_mm_load_ps     vld1q_f32       // load float vector from memory

_mm_store_ps    vst1q_f32       // store float vector to memory

_mm_add_ps      vaddq_f32       // add float vectors

As for initialising a vector, as you might with e.g. _mm_set_ps in SSE, compilers such as gcc and clang allow you to this in a slightly more C-like way with Neon data types, e.g.

const float32x4_t v = { 1.0f, 2.0f, 3.0f, 4.0f };

However if your compiler does not support this method then you may have have to use equivalent Neon intrinsics.

edited Aug 12 '16 at 15:03

answered Aug 12 '16 at 14:26

Paul R

208,748
37
389
560

4

Static initialisation like that isn't actually NEON intrinsics, it's a GCC extension (also supported by Clang). The only fully-portable intrinsic way to initialise a vector is `vld*` from an array of the scalar type. – Notlikethat Aug 12 '16 at 14:59
@Notlikethat: OK - thanks - I assumed it was just determined by the way the Neon SIMD types had been defined and didn't realise it was compiler-dependent - I'll update the answer. – Paul R Aug 12 '16 at 15:02
3

One part of the trouble is compilers which don't support the extension. The other is the layout of data on big-endian systems, which can get somewhat confusing when using both the GCC initialization syntax shown in this answer, and the Neon intrinsics. Sticking consistently to one programing model (either GCC extensions, or Neon intrinsics) is the best way to avoid confusion. – James Greenhalgh Aug 15 '16 at 10:11

Intel / ARM intrinsics equivalence

1 Answers1