What's the difference between SIMD and SSE?

Question

I am confused, what's the difference between SIMD and SSE, SSE2, SSE3, AVX etc?

According to my knowledge and research, SIMD is architecture which allows for a Single Instruction to operate on multiple data and SSE, AVX are instruction sets which implement a SIMD architecture.

And also is there a difference between vector sizes of each architecture like SSE has 128 bits and AVX has 256 bits? If the underlying SIMD architecture is the same (I think), then how do different ISAs have different vector sizes?

I'm not sure if this is true, can someone explain to me in detail what actually happens?

I'm not sure that this question is entirely on-topic, as you seem to be referring to the concepts themselves rather than asking a programming question. — Tim McNamara, May 17 '15 at 01:05
SIMD is a programming paradigm in which multiple elements of data are processed by the same instruction. SSE and AVX are both extensions of the x86 instruction set, and are implementations of this SIMD concept. There is nothing in the definition of SIMD that requires 128 bits or 256 bits to be processed at a time and no more no less. Moreover, SSE and AVX can coexist. — Iwillnotexist Idonotexist, May 17 '15 at 01:08

score 11 · Accepted Answer · answered May 17 '15 at 01:28

11

The Wikipedia page (http://en.m.wikipedia.org/wiki/SIMD) does a good job of explaining SIMD, and the instruction sets that implement it.

Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

SIMD is the 'concept', SSE/AVX are implementations of the concept. All SIMD instruction sets are just that, a set of instructions that the CPU can execute on multiple data points. As long as the CPU supports executing the instructions, then it is feasible for multiple SIMD instruction sets to coexist, regardless of data size.

answered May 17 '15 at 01:28

MuertoExcobito

9,741
2
37
78

Is cpu architecture different for different processors supporting different ISAs like cpu architecture of an x86 processor supporting SSE2 is different form a x86 based cpu supporting AVX or MIC ? – Mr.Grey May 17 '15 at 02:13
It is hard to answer that in general (and kind of off topic for SO), and depends on what your definition of CPU architecture. SIMD may be just an add-on coprocessor, meaning the 'base' CPU could be identical, for example, with the ARM cortex A8, and Neon (which was optional). Consult the specific chip documentation – MuertoExcobito May 17 '15 at 10:51

Peter Cordes · Answer 2 · 2023-07-19T04:33:59.843

SIMD = Single Instruction, Multiple Data. It's a concept in CPU architecture.

Many ISAs have SIMD extensions, like PowerPC's AltiVec, ARM's NEON / AArch64's ASIMD, etc.

SSE is an instruction-set extension for x86. (And baseline for x86-64, along with SSE2).

SSE1 and SSE2 provide a bunch of SIMD load/store and computation instructions (128-bit vector width) for float (SSE1), double, and 8 to 64-bit integer types (SSE2). Instructions like addps xmm, xmm/m128 (add Packed Single-precision) and pmaddwd xmm, xmm/m128 (SSE2).

But SIMD isn't the only thing that came with SSE

SSE1+SSE2 also provide scalar instructions for float/double math in the low elements of XMM registers, making x87 mostly obsolete. Instructions like movsd, addsd (add Scalar Double-precision), ucomisd (compare scalar-double into Integer FLAGS, like fcomi). Before SSE1/2, scalar math was done in the x87 register stack, with one-operand stack instructions, frequently requiring extra fxch instructions when more than one FP variable or temporary was being worked on at once. Bad for instruction-level parallelism and not a good compiler target.

SSE also provides some NT stores like movntps to bypass cache and avoid MESI RFOs (Read For Ownership) when storing large amounts of data, so such writes don't cost double (read to fill cache and then write on eviction). See also Enhanced REP MOVSB for memcpy for more about memory bandwidth and non-RFO stores.

SSE also provides some memory-barrier instructions like sfence (SSE1) and mfence (SSE2). sfence is useful for ordering NT stores wrt. other stores. mfence would have been useful as a StoreLoad barrier if it wasn't slower than a dummy lock or byte [esp], 0. lfence (SSE2) also exists but isn't useful for memory ordering in x86's already strongly-ordered memory model, but is useful for blocking out-of-order exec of instructions like rdtsc. (Does the Intel Memory Model make SFENCE and LFENCE redundant?)

Many ISAs would already have memory-barrier instructions as part of their basic integer ISA, so having these as part of SSE was mostly due to SSE introducing NT stores. Most ISAs already had non-bad scalar FP math instructions so that architectural state could get extended for SIMD, unlike x86 where the x87 stack was inconvenient and small.

CPUs with AVX support that as well as SSE. For mixed element sizes or for "cleanup" of leftover elements with an odd count, it can be useful to use a mix of 128-bit and 256-bit vectors. Or 128-bit vectors just as part of summing elements within a vector down to one, usually after a loop that summed vertically.

But normally in a function that already depends on AVX, you'd use the AVX encoding of 128-bit instructions. CPUs with AVX support both for backwards compatibility; AVX implies SSE. AVX1 + AVX2 provide 256-bit versions of existing FP (AVX1) and integer (AVX2) instructions, as well as adding some new instructions like shuffles.

See https://stackoverflow.com/tags/sse/info for SSE history from MMX and SSE1 through later extensions.

AVX-512 has many more new instructions.

The Hiker · Answer 3 · 2023-07-25T09:50:01.490

Asked 8 year ago. However, I'd like to try to explain it.

SIMD ==> Basically, it's a processor's technology that allows you to improve parallel processing efficiency (It makes your pc operations faster than usual).

OBS N1: you can use bigger data types (bigger than standard C data types (since SSE)).

(Exemple) Let's say you are creating a x64 2D game in C (Visual Studio + Windows API). You got yourself a; window, Pixel struct (defining what a RGB pixel is, for exemple), backbuffer, GameBITMAP struct, gameloop, etc. Whenever rendering your game, your pc will take a lot of time (costing you performance) if one single pixel is "painted" at a time, right? What if there was a way to "paint" more pixels at a time? Well, there is. By using SIMD you're able to do it.

Now, here is a very simple question (that comes with two): "How many pixels can you 'paint' at a time? And how fast?"

The awnser is the same for both them: That's up to your processor.

If your processor is from 2004 or newer, It probably supports MMX and SSE technology. For exemple, by using SSE3 (a member of the SSE family), you're able to do the same operation of "painting" pixels (with a max data type value of 16bytes/128bits).

So, if your processor supports even more modern technogy, say something from AVX family (of course, with AVX, you still can use MMX and SSE). You will able to do even more operations with bigger data types (max 256bits/32 bytes)(more pixels "painted" at a time).

Technologies (some of them)
MMX ==> Old stuff. Sometimes useful. (64 bits).
SSE family ==> Old but gold. Now, we're talking! (64 bits; 128 bits).
AVX family ==> The first member came out in 2011. Not too old. (128 bits; 256 bits)

OBS N2 (convention): GPU's/CPU's or processor (yes, I know they do different stuff).

[Important] OBS N3: SIMD's impact on performance varies depending on the specific tasks, algorithms, and the SIMD instructions supported by the processor.
Useful for (exemples): video games, video processing, image processing, etc.
Not so useful for (exemples): code that’s not performance critical, a language that doesn’t have a good SIMD support, etc.

If there is anything wrong on my comment. Please, let me know.

I'd like to credit (source info):

#1) The Intel Intrinsics Guide
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#ig_expand=6577

#2) SIMD for C++ Developers © 2019-2021 Konstantin http://const.me/articles/simd/simd.pdf#:~:text=The%20acronym%20stands%20for%20%E2%80%9Csingle%20instruction%2C%20multiple%20data%E2%80%9D.,hold%20these%20multiple%20values%20in%20a%20single%20register.

#3) Ryan Ries - EP0011 - Importing functions from DLLs and intro to SIMD - Making a video game from scratch in C
https://www.youtube.com/watch?v=DGY28XLcTcM&list=PLlaINRtydtNWuRfd4Ra3KeD6L9FP_tDE7&index=11

What's the difference between SIMD and SSE?

3 Answers3

But SIMD isn't the only thing that came with SSE