0

I want to copy columns from a std::vector<std::vector<double> > into another std::vector<std::vector<double> > in C++. This question answers that but only deals with the case where all the columns are in a sequence. In my case, the inner std::vector has 8 elements {C1, C2, C3, C4, C5, C6, C7, C8}. The new object needs to contain {C4, C5, C6, C8} and all the rows. Is there a way to do it directly?

After this step, I will be manipulating this to remove the duplicate rows and write it into a file. Also, please suggest which activity to do first (deleting "columns" or duplicates).

Just to put things in perspective - the outer std::vector has ~2 billion elements and after removing duplicates, I will end up with ~50 elements. So, a method that is faster and memory efficient is highly preferred.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
rnp
  • 55
  • 7
  • 2
    Your going to have to do an O(N) copy operation. arrays are stored in row major order so there isn't a nice little trick way to do this except to visit each element and make the copy. – NathanOliver Oct 06 '22 at 20:45
  • 1
    Arrays don't really have columns. "2D" arrays are really just arrays of arrays, each laid out sequentially, one after the other, in memory. What you see as a "column" when the array is displayed in a 2D table is just an illusion. Those elements are not actually adjacent to each other in memory, so you can't efficiently bulk-copy them. You just have to copy them one-by-one. – Miles Budnek Oct 06 '22 at 20:49
  • Thank you @MilesBudnek. Now I sort of understand how the arrays function in C++ – rnp Oct 06 '22 at 21:06
  • 1
    Processing huge data rarely can be done in both memory and computing power efficient way, so usually a trade off between the two should be made. Without measuring the actual performance (which you should really do when performance matters) I'd guess that filtering out duplicated elements on the go (before adding them to the result vector) would be faster. – MartinBG Oct 06 '22 at 22:34

1 Answers1

2

I would use std::transform.

It could look like this:

#include <algorithm> // transform
#include <vector>
#include <iostream>
#include <iterator>  // back_inserter

int main() {
    std::vector<std::vector<double>> orig{
        {1,2,3,4,5,6,7,8},
        {11,12,13,14,15,16,17,18},
    };

    std::vector<std::vector<double>> result;
    result.reserve(orig.size());

    std::transform(orig.begin(), orig.end(), std::back_inserter(result),
        [](auto& iv) -> std::vector<double> {
            return {iv[3], iv[4], iv[5], iv[7]};
        });

    // print the result:
    for(auto& inner : result) {
        for(auto val : inner) std::cout << val << ' ';
        std::cout << '\n';
    }
}

Output:

4 5 6 8 
14 15 16 18 

Note: If any of the inner vector<double>s in orig has fewer elements than 8, the transformation will access that array out of bounds (with undefined behavior as a result) - so, make sure they all have the required amount of elements.

Or using C++20 ranges to create the resulting vector from a transformation view:

#include <iostream>
#include <ranges>  // views::transform
#include <vector>

int main() {
    std::vector<std::vector<double>> orig{
        {1, 2, 3, 4, 5, 6, 7, 8},
        {11, 12, 13, 14, 15, 16, 17, 18},
    };

    auto trans = [](auto&& iv) -> std::vector<double> {
        return {iv[3], iv[4], iv[5], iv[7]};
    };

    auto tview = orig | std::views::transform(trans);
    
    std::vector<std::vector<double>> result(tview.begin(), tview.end());

    // print the result:
    for (auto& inner : result) {
        for (auto val : inner) std::cout << val << ' ';
        std::cout << '\n';
    }
}
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108