Copying non-sequential columns from an array into another array C++ and removing duplicates based on 1 column

Question

I want to copy columns from a std::vector<std::vector<double> > into another std::vector<std::vector<double> > in C++. This question answers that but only deals with the case where all the columns are in a sequence. In my case, the inner std::vector has 8 elements {C1, C2, C3, C4, C5, C6, C7, C8}. The new object needs to contain {C4, C5, C6, C8} and all the rows. Is there a way to do it directly?

After this step, I will be manipulating this to remove the duplicate rows and write it into a file. Also, please suggest which activity to do first (deleting "columns" or duplicates).

Just to put things in perspective - the outer std::vector has ~2 billion elements and after removing duplicates, I will end up with ~50 elements. So, a method that is faster and memory efficient is highly preferred.

Your going to have to do an O(N) copy operation. arrays are stored in row major order so there isn't a nice little trick way to do this except to visit each element and make the copy. — NathanOliver, Oct 06 '22 at 20:45
Arrays don't really have columns. "2D" arrays are really just arrays of arrays, each laid out sequentially, one after the other, in memory. What you see as a "column" when the array is displayed in a 2D table is just an illusion. Those elements are not actually adjacent to each other in memory, so you can't efficiently bulk-copy them. You just have to copy them one-by-one. — Miles Budnek, Oct 06 '22 at 20:49
Thank you @MilesBudnek. Now I sort of understand how the arrays function in C++ — rnp, Oct 06 '22 at 21:06
Processing huge data rarely can be done in both memory and computing power efficient way, so usually a trade off between the two should be made. Without measuring the actual performance (which you should really do when performance matters) I'd guess that filtering out duplicated elements on the go (before adding them to the result vector) would be faster. — MartinBG, Oct 06 '22 at 22:34

Ted Lyngmo · Accepted Answer · 2022-10-06T21:47:03.840

I would use std::transform.

It could look like this:

#include <algorithm> // transform
#include <vector>
#include <iostream>
#include <iterator>  // back_inserter

int main() {
    std::vector<std::vector<double>> orig{
        {1,2,3,4,5,6,7,8},
        {11,12,13,14,15,16,17,18},
    };

    std::vector<std::vector<double>> result;
    result.reserve(orig.size());

    std::transform(orig.begin(), orig.end(), std::back_inserter(result),
        [](auto& iv) -> std::vector<double> {
            return {iv[3], iv[4], iv[5], iv[7]};
        });

    // print the result:
    for(auto& inner : result) {
        for(auto val : inner) std::cout << val << ' ';
        std::cout << '\n';
    }
}

Output:

4 5 6 8 
14 15 16 18

Note: If any of the inner vector<double>s in orig has fewer elements than 8, the transformation will access that array out of bounds (with undefined behavior as a result) - so, make sure they all have the required amount of elements.

Or using C++20 ranges to create the resulting vector from a transformation view:

#include <iostream>
#include <ranges>  // views::transform
#include <vector>

int main() {
    std::vector<std::vector<double>> orig{
        {1, 2, 3, 4, 5, 6, 7, 8},
        {11, 12, 13, 14, 15, 16, 17, 18},
    };

    auto trans = [](auto&& iv) -> std::vector<double> {
        return {iv[3], iv[4], iv[5], iv[7]};
    };

    auto tview = orig | std::views::transform(trans);
    
    std::vector<std::vector<double>> result(tview.begin(), tview.end());

    // print the result:
    for (auto& inner : result) {
        for (auto val : inner) std::cout << val << ' ';
        std::cout << '\n';
    }
}

What I have is a vector of vectors std::vector >. Would this work in this case too? — rnp, Oct 06 '22 at 21:04
@rnp Not out of the box since a `vector` is empty by default. Is that your real case? If so, update the question and I'll update the answer accordingly. — Ted Lyngmo, Oct 06 '22 at 21:06
@rnp Ok, I updated the answer too (took some time, SO died on me multiple times). — Ted Lyngmo, Oct 06 '22 at 21:29

Copying non-sequential columns from an array into another array C++ and removing duplicates based on 1 column

1 Answers1