Is there some STL function to get cartesian product of two C++ vectors?

Question

Suppose

b =  ["good ", "bad "]
a  = ["apple","mango"]
then output = ["good apple","good mango","bad apple","bad mango"]

I know this can be done with nested for loops but is there some elegant one liner for doing this using C++ STL?

There probably is, but why make everything harder to understand? Loops are not evil. — deviantfan, Jun 09 '16 at 13:32
Possible duplicate of [Construction a vector from the concatenation of 2 vectors](http://stackoverflow.com/questions/35485866/construction-a-vector-from-the-concatenation-of-2-vectors) — Jonathan Mee, Jun 09 '16 at 13:36
@JonathanMee: It's not. This creates an output of length M*N, that creates an output of length N+M. — MSalters, Jun 09 '16 at 13:37
This isn't an elementwise concatenation; you're asking for something more like a product of some sort of two vectors. — , Jun 09 '16 at 13:38
`for (auto i : a) for (auto j : b) output.push_back(i + ' ' + j);` really? — Nim, Jun 09 '16 at 13:41
@Nim: I'd add a `.reserve( )` but yes, it's about that simple. — MSalters, Jun 09 '16 at 13:52
@Jonathan Mee I don't think it is a duplicate, the OP wants the concatenation of the cartesian product of the two vectors. — Alessandro Teruzzi, Jun 09 '16 at 14:25
@AlessandroTeruzzi Yes I had misunderstood the question. MSalters had already corrected me :J Though I suppose it is my fate to receive boundless corrections for such a mistake. — Jonathan Mee, Jun 09 '16 at 14:33
The name of the operation OP asks for is ["Cartesian product"](https://en.wikipedia.org/wiki/Cartesian_product) — Ilya Popov, Jun 09 '16 at 14:51
@Janathan Mee sorry about that, I should have read all the comments before typing :) — Alessandro Teruzzi, Jun 09 '16 at 15:01
Thanks for all your answers and suggestions, I thought there might be some sleek function to get the thing done because of my primary experience with functional languages where such things are common but it seems C++ doesn't has any such shorthand though some of techniques shared below are really great to know about. — Abhishek Kumar, Jun 09 '16 at 15:52
@einpoklum maybe a flattened cartesian product matrix thats why I was reluctant to call it that — Abhishek Kumar, Jun 09 '16 at 15:55
@AbhishekKumar: If anything, it looks a bit like an inner product of two vectors, but with `+` instead of `*`. — einpoklum, Jun 09 '16 at 16:58
The standard library algorithms and containers are not meant to be the ultimate all-encompassing library, but rather serve as examples for your own designs (in addition to being useful on their own). — n. m. could be an AI, Jun 10 '16 at 07:14
I need this functionality too. STL lacks some sort of map algorithm (from map-reduce, not a data structure) where an element from first sequence and the second sequence can be mapped to produce third sequence. — vSzemkel, Sep 27 '20 at 07:41

score 4 · Answer 1 · edited May 23 '17 at 11:58

4

Given vector<string> a and vector<string> b you can use for_each:

vector<string> output(size(a) * size(b));

for_each(begin(output), end(output), [&, it = 0U](auto& i) mutable {
    i = a[it / size(b)] + ' ' + b[it % size(b)];
    ++it;
});

Live Example

EDIT:

We've initialized output with enough room to contain every combination of a and b. Then we'll step through each element of output and assign it.

We'll want to use the 1^st element of a for the first size(b) elements of output, and the 2^nd element of a for the second size(b) elements, and so on. So we'll do this by indexing with it / size(b). We'll want to combine that by iteration through b's elements.

it will move to the next index for each element of output but the indexing needs to wrap or it will be out of bounds when it == size(b), to do that we use it % size(b).

EDIT2:

In this question through benchmarking I'd discovered the phenomenon that modulo and division are expensive operations for iteration. I've done the same test here. For the purpose of isolating the algorithms I'm just doing the Cartesian summation on a vector<int> not vector<string>.

First off we can see the two algorithms result in differing assembly. My algorithm as written above requires 585 lines of assembly. 588 lines were required by my interpretation of MSalter's code

vector<string> output(size(testValues1) * size(testValues2));
auto i = begin(output);

std::for_each(cbegin(a), cend(a), [&](const auto& A) { std::for_each(cbegin(b), cend(b), [&](const auto& B) { *i++ = A + ' ' + B; }); });

I have placed a pretty solid benchmarking test here: http://ideone.com/1YpzIO In the test I've only got it set to do 100 tests yet MSalters' algorithm always wins. Locally using Visual Studio 2015 in release with 10,000,000 tests MSalters algorithm finishes in about 2/3 the time it takes mine.

Clearly modulo isn't a great method of indexing :(

edited May 23 '17 at 11:58

Community

1
1

answered Jun 09 '16 at 13:47

Jonathan Mee

37,899
23
129
288

This only writes `cend(a)-cbegin(a)` elements to `output`. – MSalters Jun 09 '16 at 13:50
@MSalters Thanks, I edited. That's 2 times I've misinterpreted what the OP asked for, and 2 times you've corrected me. Hopefully we won't have to make it 3 :( – Jonathan Mee Jun 09 '16 at 14:00
@BatCoder Sure I've updated with an explanation of what the lambda is doing. – Jonathan Mee Jun 09 '16 at 14:15
This is a 4-liner (not counting the declaration). The OP asked for 1-liner. – Super-intelligent Shade Jun 09 '16 at 14:21
@InnocentBystander It is true that lambdas are typically written all on one line. Particularly when they are wrapped in an STL algorithm: `for_each(begin(output), end(output), [&, it = 0U](auto& i) mutable { i = a[it / size(b)] + ' ' + b[it % size(b)]; ++it; });` – Jonathan Mee Jun 09 '16 at 14:30
@JonathanMee, no biggie. – Super-intelligent Shade Jun 09 '16 at 14:45
@JonathanMee, please also note that your program outputs in yoda notation. "apple good" instead of "good apple". – Super-intelligent Shade Jun 09 '16 at 14:46
@InnocentBystander That's just because i'd reversed the names on the input `vectors` in my example. I've switched them to match the OPs question, which I suppose is clearer. – Jonathan Mee Jun 09 '16 at 14:50
@JonathanMee, upvoted your answer, as it's closest to a 1-liner – Super-intelligent Shade Jun 09 '16 at 15:03
@InnocentBystander I'll take it! Though I've actually upvoted [MSalter's answer](http://stackoverflow.com/a/37727838/2642059) as in my mind his is a little better than mine, just cause it doesn't require the mods and divisions. – Jonathan Mee Jun 09 '16 at 15:08
1

Not really a surprise - division and modulo are fairly slow, in comparison to integer addition. And it's on the critical path for memory access; every read of main memory has to wait for the division and modulo to finish. – MSalters Jun 13 '16 at 07:28

score 4 · Answer 2 · edited May 23 '17 at 12:14

4

Here is a one-liner (copied from Jonathan Mee's answer posted here):

for(size_t i = 0, s = a.size(); i < output.size(); ++i) output[i] = b[i/s] + ' ' + a[i%s];

Full example here.

edited May 23 '17 at 12:14

Community

1
1

answered Jun 09 '16 at 15:00

Super-intelligent Shade

5,932
23
46

This modulus technique is really enlightening. Thanks – Abhishek Kumar Jun 09 '16 at 15:49
1

@AbhishekKumar You'll find a detailed explanation of it in [my answer](http://stackoverflow.com/a/37727815/2642059)... But, as I mention in my comments, it's not without expense. A modulo operation may take several cycles to complete. [MSalter's answer](http://stackoverflow.com/a/37727838/2642059) doesn't incur such expense, thus, I'll defer to his as the best answer. – Jonathan Mee Jun 09 '16 at 16:33
@JonathanMee: the cost of a modulo operation is going to completely disappear compared to the cost of concatenation and of the memory allocations that are going on anyway. Actually, it may very well be cheaper than the nested for due to the reduced branching. – Matteo Italia Jun 10 '16 at 06:28
@MatteoItalia Sadly that's not true :( You may find this interesting: http://stackoverflow.com/a/37299761/2642059 It's a bit of a read but was very enlightening to me. on iteration techniques particularly. – Jonathan Mee Jun 10 '16 at 10:46
division and modulo may be hundreds to tens of times slower than other basic arithmetic operations – phuclv Jun 10 '16 at 16:06
@JonathanMee: it all comes down to what you do in the loop; if you are concatenating `std::string`s like it seems here the difference disappears completely compared even just to the calls to the allocator. – Matteo Italia Jun 10 '16 at 16:29
@MatteoItalia We could do a lot more hand waving or we could run some benchmarks. Towit I've updated [my answer](http://stackoverflow.com/a/37727815/2642059) MSalter's algorithm runs in about 2/3 the time of the modulo algorithm. (Because the modulo does *not* disappear.) – Jonathan Mee Jun 10 '16 at 18:44
@LưuVĩnhPhúc You are correct. This seemed to be in question so I've added a benchmark to [my answer](http://stackoverflow.com/a/37727815/2642059). – Jonathan Mee Jun 10 '16 at 18:46
@JonathanMee: I only see benchmarks involving integers at your link. Incidentally, the assembly instructions count is a meaningless metric - just to make an example, unrolled loops are of course way longer than "regular ones", but are generally much faster. – Matteo Italia Jun 10 '16 at 19:00
@MatteoItalia Yeah, that's mentioned in the answer: "For the purpose of isolating the algorithms I'm just doing the Cartesian summation on a `vector` not `vector`." – Jonathan Mee Jun 10 '16 at 19:07
Also, the two implementations differ from other, way more important aspects - most importantly, you are calling `size()` and `operator[]` in the loop; generally the compiler isn't able to hoist out of the loop the calls to size, and I've seen very rarely generated code to remove the double indirection step required by operator[]. The range-based for goes straight with iterators (=actually plain pointers). As soon as I get to an actual computer I'll try to put together a benchmark that actually compares just the indexing. – Matteo Italia Jun 10 '16 at 19:08
@JonathanMee: and re-read what I said: when concatenating strings, all this is bullshit because a single call to the allocator dwarfs the indexing minutiae by orders of magnitude. But again, I suspect that writing a fair benchmark even the division method should fair mostly the same. A division on the index (=which is already in a register) on a current CPU is way faster than any memory access (and in this loop you have at least three memory accesses). The difference that may determine the winner will probably be due to the extra branch in one case and one extra memory access in the other. – Matteo Italia Jun 10 '16 at 19:13
@MatteoItalia If I understand you correctly you're saving that you think the compiler will *not* optimize away the fact that I access `a[i / size(b)]` `size(b)` times in a row? I mean if that's the case it would clearly be slower. I expect that the compiler *is* already handling that, but if we'd have to find a way to isolate the branch to modulo comparison. – Jonathan Mee Jun 10 '16 at 19:36
1

@JonathanMee: ok, so: hoisting manually anything out of the loop [doesn't give any particular advantage](http://ideone.com/9VvrlE), the nested loop wins; however, my main point stands - when you do this with strings, [they are indistinguishable](http://ideone.com/HVZmRS). – Matteo Italia Jun 10 '16 at 20:30
1

@JonathanMee: what is even more interesting is that the MSalter's version is *one order of magnitude* faster on biggish vectors on my machine. It turns out that with that code on a 64 bit machine g++ is able to perform an automatic vectorization on the loop, and emit a `movdqa`/`paddd`/`movups` sequence, which actually performs the summation 4 elements a time. Even better, with `-march=native` it uses AVX and goes straight to `vpaddd`/`vmovup`. – Matteo Italia Jun 10 '16 at 20:43
@MatteoItalia Wow, so you're earlier statement about the processor being able to hide the expense of the modulo within the evaluation of strings is absolutely correct. You clearly had a deeper understanding of this than I did from the get go. Thanks for sharing. – Jonathan Mee Jun 13 '16 at 10:57
1

@JonathanMee: well it's not really the processor "hiding" anything, it's more like "allocating a new `std::string` is so much costlier than anything else happening in that loop that this kind of modification is really just noise". Think of it like putting a box of cookies in the trunk of your car - it's not really going to affect your mileage =) . – Matteo Italia Jun 13 '16 at 11:47
OTOH, I was *seriously* underestimating how much that division could cost (in the lightweight `int` case) - looking at the profiler output it's clear that it introduces a hard data dependency that blocks the memory accesses as long as the division output is not ready (well, it's actually trivial, I should have realized it even without looking at the profiler) in a situation where the memory access pattern is nearly optimal (it's plain sequential access, the data is probably all prefetched in cache). Besides, the automatic vectorization is really the nail on the coffin on the division method. – Matteo Italia Jun 13 '16 at 11:50
All in all, it was an interesting trip, thank you for questioning my beliefs. =) – Matteo Italia Jun 13 '16 at 11:51

score 3 · Answer 3 · edited Jun 10 '16 at 17:30

3

There's no direct solution; I checked the whole of <algorithm>. None of the functions produce an output of length M*N.

What you can do is call std::for_each on the first range, using a lambda which calls std::for_each on the second range (!)

std::vector<std::string> a, b;
std::for_each(a.begin(), a.end(), 
  [&](std::string A) { std::for_each(b.begin(), b.end(),
      [A](std::string B) { std::cout << A << '/' << B << '\n';  }
);});

But that's just a nested loop in STL.

edited Jun 10 '16 at 17:30

Jonathan Mee

37,899
23
129
288

answered Jun 09 '16 at 13:48

MSalters

173,980
10
155
350

9

I think you have to be a particular kind of masochist to replace two simple range based for loops and a push back with this... ;) – Nim Jun 09 '16 at 13:50
This is a 4-liner (not counting the declaration). The OP asked for 1-liner. – Super-intelligent Shade Jun 09 '16 at 14:20
8

@InnocentBystander: I don't know how wide his monitor is ;) – MSalters Jun 09 '16 at 14:22
@MSalters, I s'pose he could have one of [these](http://www.lg.com/us/monitors/lg-29UM68-P-ultrawide-monitor) :) – Super-intelligent Shade Jun 09 '16 at 14:26
@MSalters, fixed your answer. It was outputting apple/good instead of good apple. – Super-intelligent Shade Jun 09 '16 at 14:42
@InnocentBystander I'm uncertain of the fascination with line count. And I'm more uncertain what defines a line. And I'm the most uncertain as to what defines a line when a lambda is involved. But i'd say this provides an apt answer to the OP's question. – Jonathan Mee Jun 09 '16 at 14:46
@InnocentBystander Your edit to this was bad, because it used the input `vectors` from my example. [Which were swapped](http://stackoverflow.com/questions/37727381/is-there-some-stl-function-to-get-cartesian-product-of-two-c-vectors/37729622?noredirect=1#comment62929905_37727815) I'm rolling it back. – Jonathan Mee Jun 10 '16 at 17:30
@MSalters Your answer is not well suited for iteratively assigning to a `vector` other than by `push_back` which is known to be slow. I've modified your algorithm in [my answer](http://stackoverflow.com/a/37727815/2642059) for benchmarking purposes. I'd welcome any comments. – Jonathan Mee Jun 10 '16 at 18:49
@JonathanMee: It's not so bad with `.reserve()`. – MSalters Jun 13 '16 at 07:30
@MSalters Clearly from all the research we've done this is the best solution, but it's also already the most typing. But it's sad that we have to also add the `reserve` or use an `iterator` to use the algorithm. A small price to pay for a great speedup really I suppose. – Jonathan Mee Jun 13 '16 at 11:05
1

@JonathanMee: It's the price paid for STL's failure to implement iterator ranges as first-class objects. If there would have been a concept `OutputRange`, it could have had a `.reserve` method. – MSalters Jun 13 '16 at 11:10

Is there some STL function to get cartesian product of two C++ vectors?

3 Answers3