10

I read here that Intel introduced SSE 4.2 instructions for accelerating string processing.

Quote from the article:

The SSE 4.2 instruction set, first implemented in Intel's Core i7, provides string and text processing instructions (STTNI) that utilize SIMD operations for processing character data. Though originally conceived for accelerating string, text, and XML processing, the powerful new capabilities of these instructions are useful outside of these domains, and it is worth revisiting the search and recognition stages of numerous applications to utilize STTNI to improve performance

  • Does gcc make use of these instructions if they are available?
  • If so, which version?
  • If it doesn't, are there any open source libraries which offer this?
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Steve Lorimer
  • 27,059
  • 17
  • 118
  • 213
  • 2
    I doubt that GCC will be able to recognize a specific task as being test-processing and use them automatically. But I wouldn't be surprised if functions like `strcpy()` are done using SSE4.2 by the compiler. – Mysticial May 14 '13 at 01:10
  • @Mysticial I guess I was referring to `strcpy` et al; but more importantly `atoi` etc, and their c++11 equivalents `std::stoi` etc – Steve Lorimer May 14 '13 at 01:12
  • 1
    see http://stackoverflow.com/questions/7919304/gcc-sse-code-optimization You need to tell gcc. Libraries are already compiled, so only your code will be affected, unless you recompile the libraries too. – imel96 May 14 '13 at 01:32
  • @Mysticial from `man gcc` I see it uses `-mfpmath=sse` by default on `x86-64`, which enables `SSE/SSE2`. From this I infer that I need to add `-msse4.2` - would you agree? – Steve Lorimer May 14 '13 at 01:54
  • @lori I've never actually relied on compiler vectorization, so I can't say for sure it will actually do it. But at least `-msse4.2` will enable the SSE4.2 intrinsics header as well as whatever SSE4.2 optimizations the compiler supports. – Mysticial May 14 '13 at 01:58
  • @lori so it looks like `gcc` supports packed compare instrinsics http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_sse42_comp.htm via smmintrin.h – Shafik Yaghmour May 15 '13 at 03:00

2 Answers2

4

In regards to software libraries I would look at Agner Fog's asmlib. It has a collection of many routines, including several string manipulation ones which use SSE4.2, optimized in assembly. Some other useful functions it provides which I use return information on the CPU such as the cache size for each level and which extensions (e.g. SSE4.2) are supported.

http://www.agner.org/optimize/asmlib.zip

To enable SSE4.2 in GCC compile with -msse4.2 or if you have a processor with AVX use -mavx

  • glibc does runtime dispatching to hand-written asm; you don't need `-msse4.2` for it to use an SSE4.2 version or `strstr`. (SSE4.2 instructions aren't faster for `strcmp` or `strchr` or other simple functions. [How much faster are SSE4.2 string instructions than SSE2 for memcmp?](https://stackoverflow.com/q/46762813) - they're not) – Peter Cordes Apr 18 '22 at 20:57
3

I'm not sure about whether gcc uses that, but it shouldn't matter as text processing is generally done through glibc. If you use the standard string functions from string.h (probably cstring will do the same), and have a reasonable glibc you should be using them automatically.

I have searched for it and it seems glibc 2.15 (possibly even older ones have it) already has SSE4.2 strcasecmp optimizations:

http://upstream.rosalinux.ru/changelogs/glibc/2.15/changelog.html

linuxbuild
  • 15,843
  • 6
  • 60
  • 87
tothphu
  • 899
  • 12
  • 21