3

I am aware that the valarray class was originally implemented with the aim to optimize high-speed numerical computations.

As a drawback, many of its design aspects that support this - for instance the restrict-type mechanism that limits aliasing for pointers, or the absence of range checking - impose cumbersome limitations on the developer, and increase the risk of runtime errors. Inability of valarray to append() or emplace() is also an issue.

On the other hand, valarray's attractiveness resides in its capability to render vector operations as a scalar expression:

#include <valarray>
using std::valarray

int main() {
    valarray<int> a = {1, 2, 3}, b = {4, 5, 6};
    valarray<int> c = a + b;
    // instead of a loop or something like transform(begin(a), end(a), begin(b), begin(c), plus<int>());
}

Users appreciate a concise notation such as the above coming natively with the language, as happens in FORTRAN and Matlab.

Chief advantage, this eliminates the need to resort to external libraries like Eigen or Blitz++, or to make recourse to fancy constructs like expression templates.

It's easy to emulate the vector sum in the code above with a friend operator in a custom myVector class:

myVector<int>& operator+(const myVector<int>& a, const myVector<int>& b) { 
    for (size_t i=0, i!=a.size(), ++i) 
        c[i] = a[i] + b[i];

    return c;
}

But I doubt valarray was implemented so simplistically.

Actually, I have read that modern CPUs have native SIMD (Same Instruction Multiple Data) capability. That is, they can apply the same instruction to a chunk of multiple, coalescent data. This is a hardware-level vectorization that is activated automatically by a (modern enough) compiler when optimizing code at compile-time.

Apparently the most a programmer can do to entice the compiler to use SIMD, is to code the data so they are stored contiguously, and to employ STL algorithm functions in place of loops. As an aside, this is very similar to what GPUs do to facilitate computations involving multidimensional vectors.

Given all of the above, it seems logical to me that the use of valarray should automatically spur the compiler to implement SIMD. Is this the case?

I know Intel dug up valarray from obscurity a few years ago, and now offers libraries of vectorized mathematical functions. Have they managed to tweak valarray for SIMD use?

Giogre
  • 1,444
  • 7
  • 19
  • 5
    It should be capable. Only way to really know though is to compile the code and check the assembly. – NathanOliver Jul 27 '22 at 12:42
  • Did you try experimenting at Godbolt? – Paul Sanders Jul 27 '22 at 12:47
  • 3
    Live demo: https://godbolt.org/z/hqrWTM5n1. Note that those `vpaddd` instructions are SIMD instructions. – Daniel Langr Jul 27 '22 at 12:57
  • 1
    @DanielLangr `-mavx2` 426 ms vs `-mavx512f` 330 ms on my station. Didn't know godbolt was so complete, you need to inspect instruction labels one by one though. – Giogre Jul 27 '22 at 13:05
  • 2
    not quite yet [Why don't std::valarray get more attention from all C++ compilers? Expression templates mean big speedup to certain math-heavy tasks](https://stackoverflow.com/q/71951442/995714), [Why is valarray so slow?](https://stackoverflow.com/q/6850807/995714), [Why is valarray so slow on Visual Studio 2015?](https://stackoverflow.com/q/56050322/995714) – phuclv Jul 27 '22 at 14:25

1 Answers1

1

Intel has its own implementation of valarray which is used to achieve performance benefits and parallelism.

Intel's valarray implementation uses the Intel® Integrated Performance Primitives (Intel® IPP), which is part of the Intel BaseToolkit.

Please refer to below link:

https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/libraries/intel-c-class-libraries/intel-s-valarray-implementation.html