Transform a vector of elements into another vector of elements in parallel, without requiring the latter to be initialized

Question

I have a const std::vector<T> source of a big size, and a std::vector<T> dest. I want to apply a transformation to each element in source and store it in dest; all of this in parallel, and knowing that T is not default constructible.

What I initially attempted was using std::transform with a parallel execution policy:

std::transform(std::execution::par_unseq,
  source.begin(), source.end(),
  dest.begin(),
  [](const T& elem) { return op(elem); }
);

However, when I first compiled and run this, to my surprise, I discovered that, although the transform "loops" for source.size() times, dest's content remains unchanged.

I discovered that this is because dest must have the same size of source prior to the transform. However, I cannot do resize dest to the size of source, because T is not default constructible as it does not have a default constructor. I also would prefer not to provide a default constructor for it (first of all, it does not make sense in T's logic, but you can think that it would be expensive to call).

Does C++ STL offer any other algorithm for achieving what I have in mind?

What would suffice is an algorithm where each thread computes its own part of the source vector, and then the results are collected and joined into the same dest vector.

A `for_each` that pushes the result into the destination `vector`? — user4581301, Jun 08 '22 at 22:45
The requirement that this must be done in parallel and that `result` is initially empty seems to suggest that there would be concurrency issues in pushing the transformed elements :( — steddy, Jun 08 '22 at 22:48
Reminder: The source and destination memory must be in the processor's data cache to avoid cache misses (requiring reloading of the cache). You may gain some speed by performing this in parallel, but the data cache misses may use up any gains from parallel execution. — Thomas Matthews, Jun 08 '22 at 23:03
I recommend performing the transformation in blocks that perform in parallel. For example, load source into the data cache, load destination into the data cache, then transform in parallel. Repeat. — Thomas Matthews, Jun 08 '22 at 23:04
@ThomasMatthews This is precisely what `std::transform` does: it spawns some threads (8 on my machine) and each thread transforms 1/8 of the `source` vector and stores each element in the corresponding `dest` location. — steddy, Jun 08 '22 at 23:09
If you want to end up with everything in one vector, then you could use something like OpenMP to run a known number of threads, each transforming one block of the input as @Thomas suggests into a separate vector. Then, when all that completes, move your transformed blocks into `dest` (which, I assume, is cheap) and then discard them. Sounds fun to write. — Paul Sanders, Jun 08 '22 at 23:09
Having said which, I don't understand what type of object cannot be cheaply default-constructed. Perhaps you could share some details. — Paul Sanders, Jun 08 '22 at 23:12
It sounds like in an ideal world, one would `reserve()` capacity in the output vector and then `emplace()` each item `T` in their desired positions... Unfortunately of course, vector reserve and emplace don't quite work like that —what is being described is "allocate memory for and create a reserved _slot_ for each item and then construct them in-place later". — saxbophone, Jun 08 '22 at 23:13
I wonder if using `std::vector>` for the output array would be of any help? Theoretically, that way you can "instantiate" a number of empty placeholders for `T` without calling T's default ctor (I think) and then populate them as-needed, probably without needing much extra work given value-assignment semantics of `std::optional<>` — saxbophone, Jun 08 '22 at 23:19
@saxbophone Exactly. I also tried with `dest.reserve(source.size())`, but it is useless, as it only reserves memory, but does not change the vector's size, which is needed by `transform` — steddy, Jun 08 '22 at 23:22
@saxbophone the `optional`'s idea is interesting, I didn't think about it. It's not really idiomatic as it is more of a bypass, but it would for sure work! — steddy, Jun 08 '22 at 23:27
@steddy I agree it feels a bit awkward, I personally cannot see another way to achieve what you describe given your constraints and those of the stdlib container types (without creating our own container, which would be a far more elaborate and costly potential solution). — saxbophone, Jun 08 '22 at 23:30
Why can't you resize dest first by calling the non-default constructor? `dest.resize(src.size(), T(someNonDefaultParameter))`; or first create a copy of src, then overwrite it in transform. Also, is it a requirement to use `std::transform`? Is it acceptable to directly use `std::thread` ? — Ari, Jun 08 '22 at 23:31
@PaulSanders The struct `T` in question contains Eigen (a linear algebra library) fields. In reality, t can be constructed cheaply (if you look at my previous post, I talk about it); however, it still constitutes unnecessary work (I would essentially create a vector of temporary objects to overwrite them later), and requires the struct to have a default constructor (I can define it, but it's ugly as it contrasts with the application logic). So, the justifications I made in the original post are a bit different, but they made it shorter to explain. — steddy, Jun 08 '22 at 23:34
@Ari You are right, I could call `resize` passing the non-default constructor, but it is still "useless work", as the default constructed object is shortly after to be overwritten (more precisely, assigned). Similarly, I could copy `src` and then overwrite it in place, but it still involves a copy... Why can't I just transform the elements and put them in some un-initialized but allocated memory? — steddy, Jun 08 '22 at 23:48
There are some tricks to resize a vector without calling the per element constructor. See this: https://stackoverflow.com/a/11925009/1383356 — Ari, Jun 09 '22 at 00:32
@steddy apparently someone doesn't want you to know this since they deleted my last comment saying exactly this, but: `you can put them in uninitalized, but allocated memory, just not (currently) with vector. You'll have to pick a different data structure if you want to do that.` — Taekahn, Jun 09 '22 at 13:25

score 6 · Accepted Answer · answered Jun 08 '22 at 23:45

6

Try using a std::vector<> of std::optional<T>:

#include <algorithm>
#include <execution>
#include <optional>
#include <vector>

struct T { // NOTE: T is NOT default-constructible!
    T(int x) : value(x) {}
    operator int() { return value; }
    int value;
};

T op(const T& in) {
    return in.value * 2;
}

#include <iostream>

int main() {
    const std::vector<T> source = {4, 5, 6, 7, 8, 9, 10, 11};
    std::vector<std::optional<T>> dest(source.size());

    std::transform(
        std::execution::par_unseq,
        source.begin(), source.end(),
        dest.begin(),
        [](const T& elem) { return op(elem); }
    );
    for (auto i : dest) {
        std::cout << *i << " ";
    }
    std::cout << std::endl;
}

Sample output: 8 10 12 14 16 18 20 22

Godbolt demo: https://godbolt.org/z/ecf95cneq

This is admittedly not that idiomatic, as your dest array now has every element wrapped up in an std::optional<> container, but I believe it otherwise offers the semantics you specify.

answered Jun 08 '22 at 23:45

saxbophone

779
1
6
22

Definitely the answer I will accept. Before closing, I will wait a bit if someone else's has a more idiomatic approach. Thank you! – steddy Jun 08 '22 at 23:55
Thank you, yes, I am interested to see if there is a better approach than this one. The only other approaches I can think of are: create your own vector-like container class that provides the allocation semantics you specify (a lot of work to go through for one thing!) or _maybe_ writing your own custom allocator, but I have no experience of doing that second one so I can't say for sure... – saxbophone Jun 08 '22 at 23:58
An inventive solution, I like it. – Paul Sanders Jun 09 '22 at 16:15
@saxbophone After a bit of thought, posted my own modest effort :) – Paul Sanders Jun 12 '22 at 23:30

score 1 · Answer 2 · answered Jun 12 '22 at 23:29

Also inspired by @saxbophone's answer, here's a solution that avoids the (minor) overhead that std::optional introduces. It is based on the following simple, roll-your own container:

template <class T> class SimpleVector
{
public:
    SimpleVector (size_t capacity) :
        m_capacity (capacity), storage (static_cast <T *> (malloc (capacity * sizeof (T)))) { }
    ~SimpleVector () { free (storage); }

    T *data () { return storage; }
    T &operator [] (size_t i) { return storage [i]; }
    T *begin () { return storage; }
    T *end () { return storage + m_capacity; }

private:
    size_t m_capacity;
    T *storage;
};

This requires one T to be constructed and subsequently moved for each element generated by std::transform, which is the same as the std::optional solution (I did test this). Note the use of malloc and free to avoid default-constructing any Ts.

The complete code is here and looks like this (it generates the same output as @sax):

#include <iostream>
#include <algorithm>
#include <execution>
#include <vector>
#include <utility>

template <class T> class SimpleVector
{
public:
    SimpleVector (size_t capacity) :
        m_capacity (capacity), storage (static_cast <T *> (malloc (capacity * sizeof (T)))) { }
    ~SimpleVector () { free (storage); }

    T *data () { return storage; }
    T &operator [] (size_t i) { return storage [i]; }
    T *begin () { return storage; }
    T *end () { return storage + m_capacity; }

private:
    size_t m_capacity;
    T *storage;
};

struct T
{
    T (int x) : value (x) { }
    operator int () const { return value; }
    int value;
};

T op (const T& in) { return in.value * 2; }

int main ()
{
    const std::vector <T> source = {4, 5, 6, 7, 8, 9, 10, 11};
    SimpleVector <T> dest (source.size ());

    std::transform (std::execution::par_unseq,
        source.cbegin (), source.cend (), dest.begin (),
        [] (const T& elem) { return op (elem); } );

    for (const auto &it : dest) { std::cout << it << " "; }
}

Feel free to use SimpleVector for anything you like if you so choose. You can always add accessor / convenience methods to it if you need them.

Ari · Answer 3 · 2022-06-09T02:53:26.803

0

Inspired by @saxbophone answer, you could also use a vector of pointers.

Credit for the code template to @saxbophone.

#include <algorithm>
#include <execution>
#include <optional>
#include <vector>

struct T { // NOTE: T is NOT default-constructible!
    T(int x) : value(x) {}
    operator int() { return value; }
    int value;
};

T* op(const T& in) {
    return new T(in.value*2);
}

#include <iostream>

int main() {
    const std::vector<T> source = {4, 5, 6, 7, 8, 9, 10, 11};
    std::vector<T*> dest(source.size());

    std::transform(
        std::execution::par_unseq,
        source.begin(), source.end(),
        dest.begin(),
        [](const T& elem) { return op(elem); }
    );
    for (auto i : dest) {
        std::cout << *i << " ";
    }
    std::cout << std::endl;

    // Don't forget to delete dest.
}

Live example: https://godbolt.org/z/r73qr1ze3

edited Jun 09 '22 at 02:53

answered Jun 09 '22 at 00:27

Ari

7,251
11
40
70

2

A vector of `std:: unique_ptr`s would make more sense, but.this approach is not cache-friendly. – Paul Sanders Jun 09 '22 at 12:20
I agree. But nor is std:: optional, right? – Ari Jun 09 '22 at 14:32
2

`std::optional` is a weird duck. It's not allowed to dynamically allocate, so whether or not the value is present, the memory for it is. `std::optional` will be as cache friendly as the contained class allows it to be. – user4581301 Jun 09 '22 at 15:26
1

@user4581301 good point about `optional`, it looks like it uses placement-new to ensure the behaviour that you mention: https://stackoverflow.com/questions/25844598/implementation-of-the-stdoptional-class – saxbophone Jun 09 '22 at 20:27

Transform a vector of elements into another vector of elements in parallel, without requiring the latter to be initialized

3 Answers3