Questions tagged [sse2]

x86 Streaming SIMD Extensions 2 adds support for packed integer and double-precision floats in the 128-byte XMM vector registers. It is always supported on x86-64, and supported on every x86 CPU from 2003 or later.

See the x86 tag wiki for guides and other resources for programming and optimising programs using x86 vector extensions, and the SSE tag wiki for other SSE- and SSE2-related resources.

SSE2 is one of the SSE family of x86 instruction-set extensions.

SSE2 adds support for double-precision floating point, and packed-integer (8bit to 64bit elements) in XMM registers. It is baseline in x86-64, so 64bit code can always assume SSE2 support, without having to check. 32bit code could still be run on a CPU from before 2003 (Athlon XP or Pentium III) that didn't support SSE2, but this is unlikely for most newly-written code. (And so an MMX or original-SSE fallback is not worth writing.)

Most tasks that benefit from vectors at all can be done fairly efficiently using only instructions up to SSE2. This is fortunate, because widespread support for later SSE versions took time. Use of later SSE extensions typically saves a couple instructions here and there, usually with only minor speed-ups. Notably absent until SSSE3 was PSHUFB, a shuffle whose operation was controlled by elements in a register, rather than a compile-time constant imm8. It can do things that SSE2 can't do efficiently at all.

AVX provides 3-operand versions of all SSE2 instructions.

History

Intel introduced SSE2 with their Pentium 4 design in 2001.

SSE2 was adopted by AMD for its 64bit CPU line in 2003/2004. As of 2009 there remain few if any x86 CPUs (at least, in any significant numbers) that do not support the SSE2 instruction set, which makes it extremely attractive on the Windows PC platform by offering a large feature set that can practically be assumed a "minimum requirement" that will be omnipresent (which, however, at least in 32bit mode, does not remove the necessity to check processor features).

More recent instruction sets introduce fewer features which are often highly specialized, and are at the same time supported inconsistenly between manufacturers by a significantly smaller share of processors (10-50% in 2009).

SSE2 does not offer instructions for horizontal addition, which are needed for some geometric calculations (e.g. dot product) and complex arithmetic. This functionality has to be emulated with one or several shuffles, which however are often not significantly slower than the dedicated instructions in higher revisions.

275 questions

votes

5 answers

SSE SSE2 and SSE3 for GNU C++

Is there a simple tutorial for me to get up to speed in SSE, SSE2 and SSE3 in GNU C++? How can you do code optimization in SSE?

asked Mar 19 '09 at 07:32

yoitsfrancis

4,278
14
44
73

votes

4 answers

Extended (80-bit) double floating point in x87, not SSE2 - we don't miss it?

I was reading today about researchers discovering that NVidia's Phys-X libraries use x87 FP vs. SSE2. Obviously this will be suboptimal for parallel datasets where speed trumps precision. However, the article author goes on to quote: Intel started…

floating-point sse2 x87

asked Jul 08 '10 at 16:57

codekaizen

26,990
7
84
140

votes

8 answers

Why is strcmp not SIMD optimized?

I've tried to compile this program on an x64 computer: #include int main(int argc, char* argv[]) { return ::std::strcmp(argv[0], "really really really really really really really really really" "really really really really…

c++ sse simd strcmp sse2

asked Oct 27 '14 at 10:59

user1095108

14,119
9
58
116

votes

3 answers

SSE2 option in Visual C++ (x64)

I've added x64 configuration to my C++ project to compile 64-bit version of my app. Everything looks fine, but compiler gives the following warning: `cl : Command line warning D9002 : ignoring unknown option '/arch:SSE2'` Is there SSE2 optimization…

c++ visual-studio-2008 optimization 64-bit sse2

asked Jul 01 '09 at 06:53

Kirill V. Lyadvinsky

97,037
24
136
212

votes

4 answers

Performance optimisations of x86-64 assembly - Alignment and branch prediction

I’m currently coding highly optimised versions of some C99 standard library string functions, like strlen(), memset(), etc, using x86-64 assembly with SSE-2 instructions. So far I’ve managed to get excellent results in terms of performance, but I…

performance assembly x86-64 sse2 branch-prediction

asked Aug 07 '13 at 21:18

Macmade

52,708
13
106
123

votes

2 answers

Valgrind and Java

I want to use Valgrind 3.7.0 to find memory leaks in my Java native code. I'm using jdk1.6.0._29. To do that, I have to set the --trace-children=yes flag. Setting that flag, I no longer can run valgrind on any java application, even a command…

java memory-leaks java-native-interface valgrind sse2

asked Feb 09 '12 at 18:43

RezaPlusPlus

votes

4 answers

SSE2 integer overflow checking

When using SSE2 instructions such as PADDD (i.e., the _mm_add_epi32 intrinsic), is there a way to check whether any of the operations overflowed? I thought that maybe a flag on the MXCSR control register may get set after an overflow, but I don't…

c++ x86 sse simd sse2

asked May 09 '12 at 06:44

Igor ostrovsky

7,282
2
29
28

votes

3 answers

What's the difference between logical SSE intrinsics?

Is there any difference between logical SSE intrinsics for different types? For example if we take OR operation, there are three intrinsics: _mm_or_ps, _mm_or_pd and _mm_or_si128 all of which do the same thing: compute bitwise OR of their operands.…

c sse simd intrinsics sse2

asked May 10 '10 at 17:32

user283145

votes

2 answers

SSE multiplication of 4 32-bit integers

How to multiply four 32-bit integers by another 4 integers? I didn't find any instruction which can do it.

x86 sse simd multiplication sse2

asked May 08 '12 at 14:37

Yury

1,169
2
16
29

votes

5 answers

How to test if your Linux Support SSE2

Actually I have 2 questions: Is SSE2 Compatibility a CPU issue or Compiler issue? How to check if your CPU or Compiler support SSE2? I am using GCC Version: gcc (GCC) 4.5.1 When I tried to compile a code it give me this error: $ gcc -O3 -msse2…

linux unix compiler-construction sse2 itanium

asked Nov 17 '10 at 09:54

neversaint

60,904
137
310
477

votes

1 answer

Shift a __m128i of n bits

I have a __m128i variable and I need to shift its 128 bit value of n bits, i.e. like _mm_srli_si128 and _mm_slli_si128 work, but on bits instead of bytes. What is the most efficient way of doing this?

c x86 sse simd sse2

asked Jul 12 '13 at 08:29

Filippo Bistaffa

votes

3 answers

Sum reduction of unsigned bytes without overflow, using SSE2 on Intel

I am trying to find sum reduction of 32 elements (each 1 byte data) on an Intel i3 processor. I did this: s=0; for (i=0; i<32; i++) { s = s + a[i]; } However, its taking more time, since my application is a real-time application requiring…

x86 sse simd sse2 sse3

asked Jun 07 '12 at 13:13

gpuguy

4,607
17
67
125

votes

4 answers

Determine processor support for SSE2?

I need to do determine processor support for SSE2 prior installing a software. From what I understand, I came up with this: bool TestSSE2(char * szErrorMsg) { __try { __asm { xorpd xmm0, xmm0 //…

c++ windows windows-xp sse2

asked Mar 08 '10 at 18:22

DogDog

4,820
12
44
66

votes

1 answer

Best way to load a 64-bit integer to a double precision SSE2 register?

What is the best/fastest way to load a 64-bit integer value in an xmm SSE2 register in 32-bit mode? In 64-bit mode, cvtsi2sd can be used, but in 32-bit mode, it supports only 32-bit integers. So far I haven't found much beyond: use fild, fstp to…

assembly double sse sse2 int64

asked Mar 22 '13 at 11:16

Eric Grange

5,931
1
39
61

votes

5 answers

How to divide 16-bit integer by 255 with using SSE?

I deal with image processing. I need to divide 16-bit integer SSE vector by 255. I can't use shift operator like _mm_srli_epi16(), because 255 is not a multiple of power of 2. I know of course that it is possible convert integer to float, perform…

c++ image-processing sse simd sse2

asked Feb 09 '16 at 06:28

Claudio

2 3

…

18 19 Next