-1

I'm trying to use SIMD intrinsics with a C program in XCode 7.1. (Note, I am writing a C99 program and not a C++ program).

I've included immintrin.h, and I've written several functions using intrinsic commands that function very well. I'm now trying to write a function that sums the four floats in a __m128 as follows:

float cimpl_sum_m128( __m128 x ){
  float out;
  __m128 sum = x;
  sum = _mm_hadd_ps( sum, sum );
  sum = _mm_hadd_ps( sum, sum );
  out = _mm_cvtss_f32( sum );
  return out;
}

XCode does not recognize the _mm_cvtss_f32 command. I should note that I got the command from this website: https://software.intel.com/sites/landingpage/IntrinsicsGuide/.

Can anyone explain to me why XCode doesn't recognize this command. If I can't use _mm_cvtss_f32, how do I extract a single value from a __m128 variable?

In the future, I'd like to use _mm256_cvtss_f32; is this possible? If not, how do I extract a single value from a __m256 variable?

user24205
  • 481
  • 5
  • 15
  • 2
    Are you compiling with AVX enabled (`-mavx`, or maybe `-march=native`)? – nemequ Nov 19 '17 at 07:04
  • Not sure it is related, but with the new Swift, and the many changes between versions 2 and 3 (fist of all being probably that 3 is not backward compatible (code written in 2 compiled in 3)), there is maybe another way to do that "command"? – Déjà vu Nov 19 '17 at 07:24
  • 1
    @ringø: I don’t think this has anything to do with Swift (check the tags). – Paul R Nov 19 '17 at 08:37
  • 3
    Are you sure you have SSE enabled for `_mm_cvtss_f32`? You're not compiling in 32-bit mode with SSE disabled, are you? Linux Clang/LLVM supports both intrinsics just fine (https://godbolt.org/g/QitQgz). `_mm256_cvtss_f32` of course requires `-mavx` or a `-march` that enables `-mavx` indirectly. **What exact error message do you get?** – Peter Cordes Nov 19 '17 at 23:27
  • 2
    And BTW, `_mm_hadd_ps` costs 2 shuffles + 1 add when you could have used 1 shuffle + 1 add for each narrowing step. See https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-float-vector-sum-on-x86 **for an optimized `float hsum256_ps_avx(__m256 v)`** (and for `__m128` versions with AVX or various levels of SSE, and integer and `__m128d`). – Peter Cordes Nov 19 '17 at 23:29
  • @nemequ I am; several other functions involving AVX commands work just fine. – user24205 Nov 21 '17 at 00:02
  • It's possible the function just isn't included in XCode 7.1… I have no idea how old 7.1 is, but AVX is relatively new. grep around in your headers and see if you can find it. If you can, look for ifdefs which it might be hiding behind. – nemequ Nov 21 '17 at 01:45

1 Answers1

0

It turned out to be an unrelated bug in my code. Thank you everyone for your help.

user24205
  • 481
  • 5
  • 15
  • 4
    You should probably delete the question as it's unlikely to be of benefit to anyone else in the future. – Paul R Nov 21 '17 at 08:37