Questions tagged [sse3]

SSE3, Streaming Single Instruction Multiple Data Extensions 3, is the third iteration of the SSE instruction set for the (x86) architecture.

SSE3, Streaming Single Instruction Multiple Data Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of their Pentium 4 CPU. In April 2005, AMD introduced a subset of SSE3 in revision E (Venice and San Diego) of their Athlon 64 CPUs.

28 questions

votes

3 answers

Sum reduction of unsigned bytes without overflow, using SSE2 on Intel

I am trying to find sum reduction of 32 elements (each 1 byte data) on an Intel i3 processor. I did this: s=0; for (i=0; i<32; i++) { s = s + a[i]; } However, its taking more time, since my application is a real-time application requiring…

asked Jun 07 '12 at 13:13

gpuguy

4,607
17
67
125

votes

2 answers

Optimizing code using Intel SSE intrinsics for vectorization

This is my very first time working with SSE intrinsics. I am trying to convert a simple piece of code into a faster version using Intel SSE intrinsic (up to SSE4.2). I seem to encounter a number of errors. The scalar version of the code is: (simple…

c sse sse3 sse4

asked Jun 08 '12 at 16:50

PGOnTheGo

votes

3 answers

SSE instruction set not enabled

I am getting trouble with this error: "SSE instruction set not enabled". How I can figure this out? I have ACER i7, Ubuntu 11.10, please any one can help me? Any help will be appreciated! Also running: sudo cat /proc/cpuinfo | grep…

c++ intrinsics sse2 sse3

asked Feb 04 '12 at 21:06

ksolid

votes

1 answer

How does _mm_mwait work?

How does _mm_mwait from pmmintrin.h work? (I mean not the asm for it, but action and how this action is taken in NUMA systems. The store monitoring is easy to implement only on bus-based SMP systems with snooping of bus.) What processors does…

atomic intrinsics numa sse3

asked Apr 02 '10 at 02:23

osgx

90,338
53
357
513

votes

1 answer

(Vec4 x Mat4x4) product using SIMD and improvements

I am writing a complex simulation program and it apprears that the most time consumming routine is the one for multiplying a four-vector (float4) with a 4x4 matrix. I need to run this program on several computers, which are more or less old. That is…

c++ matrix simd avx sse3

asked Jun 26 '15 at 15:01

Asohan

votes

1 answer

What is the difference between _mm_movehdup_ps and _mm_shuffle_ps in this case?

If my understanding is correct, _mm_movehdup_ps(a) gives the same result as _mm_shuffle_ps(a, a, _MM_SHUFFLE(1, 1, 3, 3))? Is there a performance difference the two?

x86 sse intrinsics micro-optimization sse3

asked May 21 '19 at 12:21

ThreeStarProgrammer57

2,906
2
16
24

votes

2 answers

Performance degrade while using alternative for Intel intrinsics SSSE3

I am developing a performance critical application which has to be ported into Intel Atom processor which just supports MMX, SSE, SSE2 and SSE3. My previous application had support for SSSE3 as well as AVX now I want to downgrade it to Intel Atom…

intel sse simd sse3 intel-atom

asked Feb 21 '14 at 07:30

Harrisson

votes

2 answers

How to enable SSSE3 intrinsics but disable their use in compiler optimization

I have a code that uses SSSE3 intrinsic commands (note the triple S) and a runtime check whether to use it, therefore I assumed that the application should execute on CPUs without SSSE3 support. However, when using -mssse3 with -O1 optimization the…

c++ optimization gcc sse sse3

asked Jul 16 '13 at 07:52

Mark S

votes

1 answer

Converting 24 to 16 bit audio using SSE/simd instructions

I wonder if there is any fast method to do a 24 bit to 16 bit quantization on an array of audio samples (using intrinsics or asm). Source format is signed 24 le. Update : Managed to get the conversion done like described : static void __cdecl…

audio simd sse2 quantization sse3

asked May 02 '15 at 21:40

ohrfritz

votes

1 answer

C++ SSE3 instruction set not enabled

I'm trying to work up some hidden markov code in c++ using the HMMlib library from http://www.cs.au.dk/~asand/?page_id=152 I am using an ubuntu 12.04, with gcc / g++ 4.6 My compile step instruction is: g++ -I/usr/local/boost_1_52_0 -I../…

c++ compiler-errors sse3

asked Feb 15 '13 at 11:36

Aditya Sihag

5,057
4
32
43

votes

1 answer

convertion of four packed single precision floating point to unsigned double words in x86-SSE

Is there a way to convert four packed single precision floating point values to four double words in x86 with SSE extension? The closest instruction would be CVTPS2PI, but it cannot be executed on two xmm registers, instead should be given as…

assembly x86-64 sse floating-point-conversion sse3

asked Oct 29 '20 at 13:42

Iman Abdollahzadeh

votes

1 answer

Intel intrinsics support for Atom cloverview processor

I have an application which was designed for Sandbridge processors using SSE to AVX, now I want the same application to run on Atom Processors. I was recently browsing net for intrinsic support for Atom cloverview processors. Every where it mentions…

intel simd sse2 sse3 intel-atom

asked Feb 22 '14 at 09:15

Harrisson

votes

2 answers

Memory Access Violations When Using SSE Operations

I've been trying to re-implement some existing vector and matrix classes to use SSE3 commands, and I seem to be running into these "memory access violation" errors whenever I perform a series of operations on an array of vectors. I'm relatively new…

c++ ubuntu vector sse sse3

asked Sep 12 '12 at 18:14

Eric Foote

votes

2 answers

How do I enable the SSE3/SSE4.1 instruction set in Visual Studio 2008?

I tried to follow: Project > Properties > Configuration Properties > C/C++ > Code Generation > Enable Enhanced Instruction Set But the only options I got were - SSE or SSE2. Thanks.

visual-studio-2008 sse simd sse3

asked May 05 '10 at 20:06

Igor

votes

1 answer

SIMD integer store

I am writing a program using SSE instructions to multiply and add integer values. I did the same program with floats but I am missing an instruccion for my integer version. With floats, after I have finished all my operations, I return de values…

vectorization sse simd mmx sse3

asked Nov 03 '13 at 11:39

Thudor

2 Next