21

I've a std::vector<int> and I need to remove all elements at given indexes (the vector usually has high dimensionality). I would like to know, which is the most efficient way to do such an operation having in mind that the order of the original vector should be preserved.

Although, I found related posts on this issue, some of them needed to remove one single element or multiple elements where the remove-erase idiom seemed to be a good solution. In my case, however, I need to delete multiple elements and since I'm using indexes instead of direct values, the remove-erase idiom can't be applied, right? My code is given below and I would like to know if it's possible to do better than that in terms of efficiency?

bool find_element(const vector<int> & vMyVect, int nElem){
    return (std::find(vMyVect.begin(), vMyVect.end(), nElem)!=vMyVect.end()) ? true : false;
}

void remove_elements(){

    srand ( time(NULL) );

    int nSize = 20;
    std::vector<int> vMyValues;
    for(int i = 0; i < nSize; ++i){
            vMyValues.push_back(i);
    }

    int nRandIdx;
    std::vector<int> vMyIndexes;
    for(int i = 0; i < 6; ++i){
        nRandIdx = rand() % nSize;
        vMyIndexes.push_back(nRandIdx);
    }

    std::vector<int> vMyResult;
    for(int i=0; i < (int)vMyValues.size(); i++){
        if(!find_element(vMyIndexes,i)){
            vMyResult.push_back(vMyValues[i]);
        }
    }
}
Evg
  • 25,259
  • 5
  • 41
  • 83
Peter
  • 281
  • 1
  • 4
  • 8
  • 2
    The problem is, that the indices won't be valid anymore after the first element is erased, same with iterators (you can get an iterator from an index with `vec.begin() + index`). – Xeo Jul 07 '11 at 11:11
  • @Georg, the code does what it should. The idea is to remove the `element` that is at a given `position`. In my code, an `element` is represented by `vMyValues` and the `position` by the `vMyIndexes`. – Peter Jul 07 '11 at 11:39
  • I think i had the same blind spot as Andy while reading... your current code doesn't remove in place so that issue isn't there ;) – Georg Fritzsche Jul 07 '11 at 11:45
  • If efficiency is an issue, and if you are doing a lot of deleting-by-position (and maybe also inserting), consider using other container than `std::vector` --- `sdt::list` (linked-list container) or `std::set` (container where the key is the values themselves) – Itamar Katz Jul 07 '11 at 11:50

6 Answers6

23

I think it could be more efficient, if you just just sort your indices and then delete those elements from your vector from the highest to the lowest. Deleting the highest index on a list will not invalidate the lower indices you want to delete, because only the elements higher than the deleted ones change their index.

If it is really more efficient will depend on how fast the sorting is. One more pro about this solultion is, that you don't need a copy of your value vector, you can work directly on the original vector. code should look something like this:

... fill up the vectors ...

sort (vMyIndexes.begin(), vMyIndexes.end());

for(int i=vMyIndexes.size() - 1; i >= 0; i--){
    vMyValues.erase(vMyValues.begin() + vMyIndexes[i])
}
Christian Goltz
  • 746
  • 3
  • 7
  • +1 for elegant solution. I'm not sure about its efficiency, because erasing from highest to lowest you'll move tail elements many times – Andriy Tylychko Jul 07 '11 at 12:18
  • @Andy T.: no matter how you erase, you will always move all the elements "after" the removed element. By erasing from the end, you minimize the number of elements to move for each "index", and therefore it is the most efficient "in-place" solution. – Matthieu M. Jul 07 '11 at 16:53
  • @Matthieu: right, order doesn't matter, but I don't agree about `the most efficient in-place solution`, please see my post – Andriy Tylychko Jul 07 '11 at 16:59
  • @AndyT: right, my comment on efficiency should be reduced to those strategies that rely on calling `erase`. They are other strategies. – Matthieu M. Jul 07 '11 at 17:10
  • Consider the following vectors of indices vInt : (1)If vInt has multiplicity { 1, 2, 2, 3 } you'll delete wrong elements (so std::unique has to be called on vInt) (2) If vInt = {1, 1, .. 1} and sized equally to vMyValues you'll delete the whole container (3) Even worse if vInt contains indices out of the range of the container {-2, -1, INT_MAX } you'll delete what elements? (so another preprocessing step is needed for range checking). The solution won't look that elegant once it's bug free, it is indeed smart but don't use it in production code (your can refer to my answer for an alternative). – Nikos Athanasiou Feb 09 '14 at 11:55
  • 1
    this method is very slow. I did not recommend to use this. – tqjustc Apr 04 '14 at 07:13
  • 1
    Yes, agree with tqjustc, this method is very very slow compared to others and it becomes very important when working on large datasets. For instance this takes nearly 8 mins to run where other methods run in 0.03 secs. – kory Apr 02 '19 at 10:38
7

Erase-remove multiple elements at given indices

Update: after the feedback on performance from @kory, I've modified the algorithm not to use flagging and move/copy elements in chunks (not one-by-one).

Notes:
  • indices need to be sorted and unique
  • uses std::move (replace with std::copy for c++98):

Github Live example

Code:
template <class ForwardIt, class SortUniqIndsFwdIt>
inline ForwardIt remove_at(
    ForwardIt first,
    ForwardIt last,
    SortUniqIndsFwdIt ii_first,
    SortUniqIndsFwdIt ii_last)
{
    if(ii_first == ii_last) // no indices-to-remove are given
        return last;
    typedef typename std::iterator_traits<ForwardIt>::difference_type diff_t;
    typedef typename std::iterator_traits<SortUniqIndsFwdIt>::value_type ind_t;
    ForwardIt destination = first + static_cast<diff_t>(*ii_first);
    while(ii_first != ii_last)
    {
        // advance to an index after a chunk of elements-to-keep
        for(ind_t cur = *ii_first++; ii_first != ii_last; ++ii_first)
        {
            const ind_t nxt = *ii_first;
            if(nxt - cur > 1)
                break;
            cur = nxt;
        }
        // move the chunk of elements-to-keep to new destination
        const ForwardIt source_first =
            first + static_cast<diff_t>(*(ii_first - 1)) + 1;
        const ForwardIt source_last =
            ii_first != ii_last ? first + static_cast<diff_t>(*ii_first) : last;
        std::move(source_first, source_last, destination);
        // std::copy(source_first, source_last, destination) // c++98 version
        destination += source_last - source_first;
    }
    return destination;
}
Usage example:
std::vector<int> v = /*...*/; // vector to remove elements from
std::vector<int> ii = /*...*/; // indices of elements to be removed

// prepare indices
std::sort(ii.begin(), ii.end());
ii.erase(std::unique(ii.begin(), ii.end()), ii.end());

// remove elements at indices
v.erase(remove_at(v.begin(), v.end(), ii.begin(), ii.end()), v.end());
Community
  • 1
  • 1
AMA
  • 4,114
  • 18
  • 32
  • 1
    This runs fairly quick, 0.47 secs on my dataset of millions, others are slower around 5-10 mins and only Andriy Tylychko's is faster at half of the time. – kory Apr 02 '19 at 11:02
  • @kory I've updated the algorithm in answer to avoid flagging and move/copy elements in chunks (not one-by-one) – AMA Apr 03 '19 at 11:08
  • 1
    yes, everything here is working and this is now the fastest overall algorithm, congrats! – kory Apr 04 '19 at 14:44
7

to avoid moving the same elements many times, we can move them by ranges between deleted indexes

// fill vMyIndexes, take care about duplicated values
vMyIndexes.push_back(-1); // to handle range from 0 to the first index to remove
vMyIndexes.push_back(vMyValues.size()); // to handle range from the last index to remove and to the end of values
std::sort(vMyIndexes.begin(), vMyIndexes.end());
std::vector<int>::iterator last = vMyValues.begin();
for (size_t i = 1; i != vMyIndexes.size(); ++i) {
    size_t range_begin = vMyIndexes[i - 1] + 1;
    size_t range_end = vMyIndexes[i];
    std::copy(vMyValues.begin() + range_begin, vMyValues.begin() + range_end,   last);
    last += range_end - range_begin;
}
vMyValues.erase(last, vMyValues.end());

P.S. fixed a bug, thanks to Steve Jessop that patiently tried to show me it

Community
  • 1
  • 1
Andriy Tylychko
  • 15,967
  • 6
  • 64
  • 112
  • @Andy T, I'm not sure if it works properly in situations where the indexes to be removed are contiguous, e.g. (13,14,15,16) – Peter Jul 07 '11 at 13:47
  • @Peter: it should, since `std::copy` will be passed two iterators that are equal, and hence will copy nothing. I'm worried about the first time through the loop, though. We copy the *second* range that we're supposed to keep, from indexes `vMyIndexes[0]+1` to `vMyIndexes[1]`, so I think we lose some values at the front if the first value in `vMyIndexes` is not 0. Possibly we should put a -1 at the start of `vMyIndexes`, or equivalently add `vMyIndexes[0]` to `last` before we start. – Steve Jessop Jul 07 '11 at 15:07
  • @Steve: I put `vMyValues.size()` at the end, it's similar to what you propose – Andriy Tylychko Jul 07 '11 at 15:11
  • Actually, my idea of sticking in a -1 is not valid, since `last` isn't allowed to be in the range `[vMyValues.begin() + range_begin, vMyValues.begin() + range_end)` according to the rules of `std::copy`. So `last` needs to not start at the beginning. – Steve Jessop Jul 07 '11 at 15:13
  • @Andy T: indeed, you've solved the problem at the end but you haven't solved the equivalent problem at the start. – Steve Jessop Jul 07 '11 at 15:14
  • @Steve: the start doesn't need to be solved, the range from beginning of values up to the first index to delete doesn't need to be touched, please correct me. do you have a test case? – Andriy Tylychko Jul 07 '11 at 15:20
  • Test case is `myIndexes` containing 1 (and nothing else). Maybe I've miscalculated, apologies if so, but I think that your code will copy the value from index 2 to index 0. – Steve Jessop Jul 07 '11 at 15:46
  • the code doesn't handle only the case when myIndex is empty. when there's single element artificial one (`myValue.size()`) will be added and should algo should work fine – Andriy Tylychko Jul 07 '11 at 16:04
  • @SteveJessop let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/1224/discussion-between-andy-t-and-steve-jessop) – Andriy Tylychko Jul 07 '11 at 16:04
  • http://www.ideone.com/1hHNR. Output is 0, which is wrong, only 2 elements should be removed. Change the values in `vMyIndexes` to 0 and 1 instead of 8 and 9, and it correctly outputs 8. Btw, well done for writing code that compiled first time, without actually compiling it yourself :-) – Steve Jessop Jul 07 '11 at 22:23
  • @Steve: oh, I see it at long last, didn't hear you, sorry. Fixed, thanks – Andriy Tylychko Jul 08 '11 at 08:36
  • This is the fastest method overall, 0.24 s compared to 0.4 s up to 5-10 mins for other solutions on datasets containing millions. – kory Apr 02 '19 at 11:48
2

What you can do is split the vector (actually any non-associative container) in two groups, one corresponding to the indices to be erased and one containing the rest.

template<typename Cont, typename It>
auto ToggleIndices(Cont &cont, It beg, It end) -> decltype(std::end(cont))
{
    int helpIndx(0);
    return std::stable_partition(std::begin(cont), std::end(cont), 
        [&](typename Cont::value_type const& val) -> bool {
            return std::find(beg, end, helpIndx++) != end;
    });
}

you can then delete from (or up to) the split point to erase (keep only) the elements corresponding to the indices

std::vector<int> v;
v.push_back(0);
v.push_back(1);
v.push_back(2);
v.push_back(3);
v.push_back(4);
v.push_back(5);

int ar[] = { 2, 0, 4 };
v.erase(ToggleIndices(v, std::begin(ar), std::end(ar)), v.end());
  • If the 'keep only by index' operation is not needed you can use remove_if insted of stable_partition (O(n) vs O(nlogn) complexity)
  • To work for C arrays as containers the lambda function should be [&](decltype(*(std::begin(cont))) const& val) -> bool { return std::find(beg, end, helpIndx++) != end; } but then the .erase() method is no longer an option
Nikos Athanasiou
  • 29,616
  • 15
  • 87
  • 153
  • This is very slow in my testing, taking around 5 mins to run compared to the 0.24 secs for Andriy Tylychko's answer. – kory Apr 02 '19 at 10:57
1

If you want to ensure that every element is only moved once, you can simply iterate through each element, copy those that are to remain into a new, second container, do not copy the ones you wish to remove, and then delete the old container and replace it with the new one :)

Jeff
  • 11
  • 1
1

This is an algorithm based on Andriy Tylychko's answer so that this can make it easier and faster to use the answer, without having to pick it apart. It also removes the need to have -1 at the beginning of the indices list and a number of items at the end. Also some debugging code to make sure the indices are valid (sorted and valid index into items).

template <typename Items_it, typename Indices_it>
auto remove_indices(
    Items_it items_begin, Items_it items_end
  , Indices_it indices_begin, Indices_it indices_end
)
{
    static_assert(
      std::is_same_v<std::random_access_iterator_tag
        , typename std::iterator_traits<Items_it>::iterator_category>
      , "Can't remove items this way unless Items_it is a random access iterator");

    size_t indices_size = std::distance(indices_begin, indices_end);
    size_t items_size = std::distance(items_begin, items_end);
    if (indices_size == 0) {
        // Nothing to erase
        return items_end;
    }

    // Debug check to see if the indices are already sorted and are less than
    // size of items.
    assert(indices_begin[0] < items_size);
    assert(std::is_sorted(indices_begin, indices_end));

    auto last = items_begin;
    auto shift = [&last, &items_begin](size_t range_begin, size_t range_end) {
        std::copy(items_begin + range_begin, items_begin + range_end, last);
        last += range_end - range_begin;
    };

    size_t last_index = -1;
    for (size_t i = 0; i != indices_size; ++i) {
        shift(last_index + 1, indices_begin[i]);
        last_index = indices_begin[i];
    }
    shift(last_index + 1, items_size);
    return last;
}

Here is an example of usage:

template <typename T>
std::ostream& operator<<(std::ostream& os, std::vector<T>& v)
{
    for (auto i : v) {
        os << i << " ";
    }
    os << std::endl;
    return os;
}

int main()
{
    using std::begin;
    using std::end;
    std::vector<int> items = { 1, 3, 6, 8, 13, 17 };
    std::vector<int> indices = { 0, 1, 2, 3, 4 };

    std::cout << items;
    items.erase(
          remove_indices(begin(items), end(items), begin(indices), end(indices))
        , std::end(items)
    );
    std::cout << items;

    return 0;
}

Output:

1 3 6 8 13 17 
17 

The headers required are:

#include <iterator>
#include <vector>
#include <iostream> // only needed for output
#include <cassert>
#include <type_traits>

And a Demo can be found on godbolt.org.

Adrian
  • 10,246
  • 4
  • 44
  • 110