7

I'm writing a SSE code to 2-D convolution but SSE documentation is very sparse.

I'm calculating dot-product with _mm_dp_ps and using _mm_extract_ps to get the dot-product result but _mm_extract_ps returns a hex float and I can't figure out how to convert this hex float to a regular float.

I could use __builtin_ia32_vec_ext_v4sf that returns a float but I wanna keep compatibility with others compilers.

_mm_extract_ps (__m128 __X, const int __N)
{
  union { int i; float f; } __tmp;
  __tmp.f = __builtin_ia32_vec_ext_v4sf ((__v4sf)__X, __N);
  return __tmp.i;
}

What point I'm missing?

A little help will be appreciated, thanks.

OpenSUSE 11.2, GCC 4.4.1, C++

Compiler options: -fopenmp -Wall -O3 -msse4.1 -march=core2

Linker options: -lgomp -Wall -O3 -msse4.1 -march=core2

tijko
  • 7,599
  • 11
  • 44
  • 64

4 Answers4

7

You should be able to use _MM_EXTRACT_FLOAT.

Incidentally it looks to me as if _mm_extract_ps and _MM_EXTRACT_FLOAT should be the other way around, i.e. _mm_extract_ps should return a float and _MM_EXTRACT_FLOAT should return the int representation, but what do I know.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • I guess it's down to the way that Intel describe the instruction in their documentation, which may or may not be an error - the gcc headers just implement what is there in the documentation. – Paul R Jun 28 '10 at 16:23
  • And is there an `_MM_EXTRACT_DOUBLE` of some sort? – Ciro Santilli OurBigBook.com May 31 '19 at 09:11
  • 1
    @CiroSantilli新疆改造中心996ICU六四事件: see [`_mm_cvtsd_f64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=1782,1788,1782,1782&text=_mm_cvtsd_f64). – Paul R May 31 '19 at 14:08
  • `_mm_extract_ps` is the intrinsic for [SSE4.1 `extractps`](https://www.felixcloutier.com/x86/extractps), the "fp" version of `pextrd`. dst = memory or a GP integer register, not an XMM register. It can't extract a scalar float into a new register. For that, use `pshufd`. (Or `insertps` with a false dependency). The compiler can use `extractps` to extra a float to memory, but the only use-case for the `_mm_extract_ps` intrinsic in C/C++ is getting the integer bit-pattern of a `float` into a scalar `uint32_t`. Type punning it back to float is just asking for the compiler to emit slow code. – Peter Cordes Jun 01 '19 at 07:46
4

_mm_cvtss_f32(_mm_shuffle_ps(__X, __X, __N)) will do the job.

Roman Zavalov
  • 575
  • 3
  • 8
2

And just to exemplify all that has been mentioned so far:

main.c

#include <assert.h>

#include <x86intrin.h>

int main(void) {

    /* 32-bit. */
    {
        __m128 x = _mm_set_ps(1.5f, 2.5f, 3.5f, 4.5f);

        /* _MM_EXTRACT_FLOAT */
        float f;
        _MM_EXTRACT_FLOAT(f, x, 3);
        assert(f == 1.5f);
        _MM_EXTRACT_FLOAT(f, x, 2);
        assert(f == 2.5f);
        _MM_EXTRACT_FLOAT(f, x, 1);
        assert(f == 3.5f);
        _MM_EXTRACT_FLOAT(f, x, 0);
        assert(f == 4.5f);

        /* _mm_cvtss_f32 + _mm_shuffle_ps */
        assert(_mm_cvtss_f32(x) == 4.5f);
        assert(_mm_cvtss_f32(_mm_shuffle_ps(x, x, 1)) == 3.5f);
        assert(_mm_cvtss_f32(_mm_shuffle_ps(x, x, 2)) == 2.5f);
        assert(_mm_cvtss_f32(_mm_shuffle_ps(x, x, 3)) == 1.5f);
    }

    /* 64-bit. */
    {
        __m128d x = _mm_set_pd(1.5, 2.5);
        /* _mm_cvtsd_f64 + _mm_unpackhi_pd */
        assert(_mm_cvtsd_f64(x) == 2.5);
        assert(_mm_cvtsd_f64(_mm_unpackhi_pd(x, x)) == 1.5);
    }
}

GitHub upstream.

Compile and run:

gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
./main.out

Doubles mentioned at: _mm_cvtsd_f64 analogon for higher order floating point

Tested on Ubuntu 19.04 amd64.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
0

extern void _mm_store_ss(float*, __m128);

See 'xmmintrin.h.'

SugarD
  • 81
  • 1
  • 1