2

I'm trying to parallelize some vector functions in a struct using openMP. While it works well with most of my implementations, I find that since the constructor for std::vector<> has a linear complexity, I can't get better performance and instead get something that's even worse than doing it sequentially for initialization.

Here's one of the initializors

         /**
         * @brief Construct a new constant parallel Vector object with a given value constantEntry
         * 
         * @param dim 
         * @param constantEntry 
         */
        parallelVector(const int dim, const double constantEntry){
            dimension = dim;
            values = std::vector<double>(dimension);

            #pragma omp parallel for schedule(static)
            for (int i=0 ; i<dimension; i++){
                values[i] = constantEntry;
            }
        }

The std::vector<> documentation says I can get O(1) complexity using allocators, but since I'm not too familiar with them, I was wondering if something with unique pointers is possible instead?

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
Michael B
  • 53
  • 6
  • 6
    If there are N elements in the vector, how are you supposed to put them in the vector in less than O(N) work? You mean do it in M threads? That makes it take O(N/M) time, which is still O(N). Do you just want to create the vector memory buffer without initializing it? – Yakk - Adam Nevraumont Mar 29 '21 at 18:07
  • Well ideally I'd hope to create the buffer and only have to initialize it in the for loop. So that's O(N/M) like you said. I read on the documentation page that one could use allocators to make the memory buffer with O(1), so that's what I meant. Possibly poorly worded. – Michael B Mar 29 '21 at 18:15
  • FWIW, `values = std::vector(dimension, constantEntry);` gets rid of the loop and for a good vector implementation, it should get some sort of low level parallelization, like SIMD. It should also be moved into the [member initialization list](https://stackoverflow.com/questions/1711990/what-is-this-weird-colon-member-syntax-in-the-constructor) – NathanOliver Mar 29 '21 at 18:21
  • not sure but my guess is that `values = std::vector(dimension);` is more expensive than the whole loop and I wouldnt expect any speedup by using more than one thread. Note that you are mixing things up a bit. You are talking about the constructor and allocations, but your code is about assigning to elements of an already constructed vector in parallel. Did you measure the two parts seperately? – 463035818_is_not_an_ai Mar 29 '21 at 18:26
  • Well yes, right now since the line `values = std::vector(dimension);` is already O(N), there's nothing I can do to speed it up in the loop. I used `values = std::vector(dimension, constantEntry);` for my "sequentialVectors" struct, but I just wanted to know if there's a way where I could just create the memory buffer without having to initialize, and then use the assignment as in the for loop in the code. – Michael B Mar 29 '21 at 18:33
  • @MichaelB You can use a custom allocator that does not perform zero-initialization during resizing. Then, you can assign element values in a parallel loop. – Daniel Langr Mar 29 '21 at 18:40

1 Answers1

5
template<class T>
struct uninitialized_allocator:std::allocator<T> {
  template<class...Us>
  void construct( T* p, Us&&... us ){
    ::new((void*)p) T(std::forward<Us>(us)...);
  }
  // don't construct when passed 0 arguments
  void construct( T* p ) {};
};

then later:

    int dimension = 0;
    std::vector<double, uninitialized_allocator<double>> values;
    parallelVector(const int dim, const double constantEntry):
      dimension(dim),
      values(dim)
    {
        #pragma omp parallel for schedule(static)
        for (int i=0 ; i<dimension; i++){
            values[i] = constantEntry;
        }
    }

note, however, that you become responsible for initializing the new vector elements on calls to resize as well as well as emplace_back() with 0 arguments.

In theory, std::vector still calls construct(T*) on each and every element, but that is a noop, and compilers are good at dead-code elimination. So under non-trivial optimization settings this should do what you want.

Note that I changed a use of operator= into construction; they are not the same, and vector is free to behave very differently in the two cases.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • Thanks. I think this is what I was hoping for. But I notice I don't really get the speedup I expected- do you think that's because of the `construct(T*)` calls? – Michael B Mar 29 '21 at 19:08
  • @MichaelB What optimization settings are you using? How are you profiling? Note that std vector still gets memory from the central heap, which can be costly, and on some platforms zeroed memory *isn't zeroed until you access it* anyhow. – Yakk - Adam Nevraumont Mar 29 '21 at 19:10
  • I run /O1 and don't use a profiler, just a simple measurement with `std::chrono`. – Michael B Mar 29 '21 at 19:17