2

I have two equal length vectors from which I want to remove elements based on a condition in one of the vectors. The same removal operation should be applied to both so that the indices match.

I have come up with a solution using std::erase, but it is extremely slow:

vector<myClass> a = ...;
vector<otherClass> b = ...;
assert(a.size() == b.size());
for(size_t i=0; i<a.size(); i++)
{
    if( !a[i].alive() )
    {

        a.erase(a.begin() + i);
        b.erase(b.begin() + i);
        i--;
    }
}

Is there a way that I can do this more efficiently and preferably using stl algorithms?

Frik
  • 1,054
  • 1
  • 13
  • 16
  • Out of interest, why can't you use `vector> a_and_b`? – Bathsheba Feb 09 '16 at 08:07
  • This is not that easy. Search for "zip iterator" to find questions such as [this](http://stackoverflow.com/questions/13840998) and just replace `std::sort` with `vector.erase(std::remove)`, maybe that works. – nwp Feb 09 '16 at 08:14

7 Answers7

8

If order doesn't matter you could swap the elements to the back of the vector and pop them.

for(size_t i=0; i<a.size();)
{
    if( !a[i].alive() )
    {
        std::swap(a[i], a.back());
        a.pop_back();
        std::swap(b[i], b.back());
        b.pop_back();
    }
    else
      ++i;
}

If you have to maintain the order you could use std::remove_if. See this answer how to get the index of the dereferenced element in the remove predicate:

a.erase(remove_if(begin(a), end(a),
        [b&](const myClass& d) { return b[&d - &*begin(a)].alive(); }),
        end(a));

b.erase(remove_if(begin(b), end(b), 
        [](const otherClass& d) { return d.alive(); }), 
        end(b));
Community
  • 1
  • 1
hansmaad
  • 18,417
  • 9
  • 53
  • 94
0

The reason it's slow is probably due to the O(n^2) complexity. Why not use list instead? As making a pair of a and b is a good idea too.

phg1024
  • 169
  • 6
  • I never met situations where list is useful in C++. – gomons Feb 09 '16 at 08:20
  • it's not the optimal solution, just thought it's a good trade off between performance and readability. – phg1024 Feb 09 '16 at 08:23
  • When you get element in list, you also get cache miss, I think it is bad trade off if you interested in performance. – gomons Feb 09 '16 at 08:29
  • depends on the size of the data structure actually. if the data structure is large enough, you are forced to get cache miss anyway. – phg1024 Feb 09 '16 at 08:30
0

A quick win would be to run the loop backwards: i.e. start at the end of the vector. This tends to minimise the number of backward shifts due to element removal.

Another approach would be to consider std::vector<std::unique_ptr<myClass>> etc.: then you'll be essentially moving pointers rather than values.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
0

I propose you create 2 new vectors, reserve memory and swap vectors content in the end.

vector<myClass> a = ...;
vector<otherClass> b = ...;
vector<myClass> new_a; 
vector<myClass> new_b;
new_a.reserve(a.size());
new_b.reserve(b.size());
assert(a.size() == b.size());
for(size_t i=0; i<a.size(); i++)
{
    if( a[i].alive() )
    {
        new_a.push_back(a[i]);
        new_b.push_back(b[i]);
    }
}
swap(a, new_a);
swap(b, new_b);

It can be memory consumed, but should work fast.

gomons
  • 1,946
  • 14
  • 25
0

erasing from the middle of a vector is slow due to it needing to reshuffle everything after the deletion point. consider using another container instead that makes erasing quicker. It depends on your use cases, will you be iterating often? does the data need to be in order? If you aren't iterating often, consider a list. if you need to maintain order, consider a set. if you are iterating often and need to maintain order, depending on the number of elements, it may be quicker to push back all alive elements to a new vector and set a/b to point to that instead.

Also, since the data is intrinsically linked, it seems to make sense to have just one vector containing data a and b in a pair or small struct.

makar
  • 497
  • 1
  • 4
  • 16
0

For performance reason need to use next. Use

vector<pair<myClass, otherClass>>

as say @Basheba and std::sort. Use special form of std::sort with comparision predicate. And do not enumerate from 0 to n. Use std::lower_bound instead, becouse vector will be sorted. Insertion of element do like say CashCow in this question: "how do you insert the value in a sorted vector?"

Community
  • 1
  • 1
oklas
  • 7,935
  • 2
  • 26
  • 42
0

I had a similar problem where I had two :

std::<Eigen::Vector3d> points;
std::<Eigen::Vector3d> colors;

for 3D pointclouds in Open3D and after removing the floor, I wanted to delete all points and colors if the points' z coordinate is greater than 0.05. I ended up overwriting the points based on the index and resizing the vector afterward.

bool invert = true;
std::vector<bool> mask = std::vector<bool>(points.size(), invert);
size_t pos = 0;
for (auto & point : points) {
    if (point(2) < CONSTANTS::FLOOR_HEIGHT) {
        mask.at(pos) = false;
    }
    ++pos;
}
size_t counter = 0;
for (size_t i = 0; i < points.size(); i++) {
    if (mask[i]) {
        points.at(counter) = points.at(i);
        colors.at(counter) = colors.at(i);
        ++counter;
    }
}
points.resize(counter);
colors.resize(counter);

This maintains order and at least in my case, worked almost twice as fast than the remove_if method from the accepted answer: for 921600 points the runtimes were:

  • 33 ms for the accepted answer
  • 17 ms for this approach.
Bjoern Urban
  • 328
  • 4
  • 14