0

Let's say I have a struct

typdef struct point{
 
  float x;
  float y;
  float z;

} point; 

I have an array of these structs and I want to do the following -

std::vector<point> copyArray;

for(auto p : array_of_points){
   
    point newPoint;

    newPoint.x = p.x;
    newPoint.y = p.y;
    newPoint.z = p.z;

    copyArray.push_back(newPoint);
}

Now this particular section in my code can be accelerated using vector operations, if I can operate on multiple structs at once.

I have a two part question

  • How can this be done using SIMD intrinsic. I am not sure how I would load structs.
  • Can OpenMP achieve the desired vectorization. I am not that comfortable with assembly and was not able to figure out whether it was indeed being vectorized efficiently or not.
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Atharva Dubey
  • 832
  • 1
  • 8
  • 25
  • 3
    If all you want to do is copy the vector then just do `std::vector copyArray(array_of_points.begin(), array_of_points.end())`, the compiler is likely to optimise this to a SIMD intrinsic for you – Alan Birtles Oct 23 '21 at 15:27
  • you could use sse or avx to load and store vectors – an inconspicuous semicolon Oct 23 '21 at 15:28
  • 3
    SIMD generally works better when you have 3 separate arrays, one each for x, y, and z. You *can* do stuff with a geometry vector inside a single SIMD vector, but it's clunky and slower (for stuff other than copying) than processing 4 xyz geometry vectors at a time, even if you pad your SIMD vectors with xyzw with unused w. See https://stackoverflow.com/tags/sse/info, especially [Slides + text: SIMD at Insomniac Games (GDC 2015)](https://deplinenoise.wordpress.com/2015/03/06/slides-simd-at-insomniac-games-gdc-2015/) which specifically covers this SIMD anti-pattern. – Peter Cordes Oct 23 '21 at 15:31
  • 2
    e.g. [this code optimises](https://godbolt.org/z/7raY43Wcx) down to a `memcpy` which will use intrinsics internally – Alan Birtles Oct 23 '21 at 15:34
  • 2
    for SIMD you should use [SoA instead of AoS](https://en.wikipedia.org/wiki/AoS_and_SoA). See [Structure of Arrays vs Array of Structures](https://stackoverflow.com/q/17924705/995714), [Improving Vectorization Efficiency using Intel SIMD Data Layout Template](https://www.intel.com/content/dam/www/public/us/en/documents/presentation/improving-vectorization-efficiency.pdf) – phuclv Oct 23 '21 at 16:05
  • 1
    What is `array_of_points`? If it was also a `std::vector`, just copy-construct `copyArray` from that. If it was a vector of different structs (different type of `x,y,z`, or additional member variables) the question would be different. You should always provide a [mre]! – chtz Oct 24 '21 at 00:48

1 Answers1

0

I would expect the loop can not be vectorized as copyArray.push_back(newPoint); is accessing a shared resource.

If you want to speed this up, you might want to look into how to convert arrays into vectors quickly. You could to start your search here.

Anthrados
  • 56
  • 1
  • 3
  • 1
    `std::vector` isn't thread-safe, so any thread accessing it will assume it's the only thread. (In C++, it would be data-race undefined-behaviour to have multiple threads doing `.push_back()` on the same `std::vector` at the same time, so compilers can and do assume that doesn't happen. That's why compilers are able to optimize normal code well.) – Peter Cordes Oct 23 '21 at 16:07
  • So TL:DR, even if `std::vector copyArray;` were global, not a local inside the same function, that wouldn't stop auto-vectorization. If the compiler manages to untangle the copy in 3x4 = 12-byte chunks into a simple memcpy, it should vectorize or recognize it as memset. – Peter Cordes Oct 23 '21 at 16:51