1

In one of my projects it is necessary to remove certain elements from a std::vector<double> values. The indices I have to remove are given as a vector of intervals. For example {1,3} means, that I have to remove indices from 1 to 3 inclusive from values.

I can assume that the intervals given are mutually exclusive.

The code shown below illustrates, what the desired behavior should look like.

#include <iostream>
#include <vector>

int main(int argc, char** args) {
    // Intervals of indices I have to remove from values
    std::vector<std::pair<int, int>> intervals = { {1,3},{7,9},{13,13} }; 

    // Vector of arbitrary values. 
    std::vector<double> values = {4.2,6.4,2.3,3.4,9.1,2.3,0.6,1.2,0.3,0.4,6.4,3.6,1.4,2.5,7.5 }
    removeIntervals(values, intervals);
    // intervals should contain 4.2,9.1,2.3,0.6,6.4,3.6,1.4,7.5
}

What might be the shortest amount of code necessary to achieve this?

My best solution so far is:

 void removeIntervals(std::vector<double>& values, const std::vector < std::pair<int, int>>& intervals) {
    std::vector<bool> flags(values.size(), true);
    std::vector<double> ret;
    for (auto interval : intervals) {
        std:fill(flags.begin() + interval.first, flags.begin()+interval.second+1, false);
    }
    for (auto i = 0; i < values.size(); i++) {
        if (flags[i]) ret.push_back(values[i]);
    }
    values = ret;
 }

I can assume, that my intervals are non-overlapping and consecutive. It seems, that it boils down to perform erase from back to front.

void removeIntervals2(std::vector<double>& values, const std::vector < std::pair<int, int>>& intervals) {
    auto revIntervals = intervals;
    std::reverse(revIntervals.begin(), revIntervals.end());
    for (auto interval : revIntervals) {
        values.erase(std::begin(values) + interval.first, std::begin(values) + interval.second + 1);
    }
}
Aleph0
  • 5,816
  • 4
  • 29
  • 80

6 Answers6

2

Since you can assume that the intervals don't overlap and are increasing order, the solution is to start at the back (so that the indexes don't change) and remove each range in turn:

So for the minimum amount of code, which you asked for:

for (auto& it = intervals.rbegin(); it != intervals.rend(); ++it) {
  values.erase(values.begin() + it->first, std::next(values.begin() + it->second));

The down side of this is that this will involve lots of shuffling of the vector. Really what you would want to do is swap the last unswapped item at the end of the vector, with the item you want to remove, and then resize when you're finished to cut off the end; but this requires more code.

Saurav Sahu
  • 13,038
  • 6
  • 64
  • 79
UKMonkey
  • 6,941
  • 3
  • 21
  • 30
  • Many thanks. That seems to be the shortest possible solution, that I was looking for. So easy! :-) – Aleph0 Dec 12 '16 at 13:00
  • 1
    `erase()` erases a range of elements `[first,last)`. So your code would fail for inclusiveness. Ex: in case of `{13, 13}`, there won't be any deletion. i have edited your answer to get real last by using `std::next()`. – Saurav Sahu Dec 12 '16 at 13:04
  • This may be shortest in terms of *your code*, but not necessarily in terms of overall code or code generated. Moveover, it certainly is not the most efficient solution, as it requires a lot of copying/moving of elements. – Walter Dec 12 '16 at 13:08
  • @Walter as stated in my answer - there's a better algorithm that would do less shuffling in the general case, but the number of shuffles required may be minimal depending on what's being erased, and improvements are premature optimisation. – UKMonkey Dec 12 '16 at 15:15
1

Thought I'd post an answer that was a little more fault tolerant. If your intervals are larger than the input array for example if intervals had included {15, 15} this will still function correctly. Additionally this is faster than UKMonkey's solution because it does all the work in a single pass:

It has come to my attention that this code is implementation defined, and only works on Clang and Visual Studio 2015 Update 3:

values.resize(distance(begin(values), remove_if(begin(values), end(values), [i = 0U, it = cbegin(intervals), end = cend(intervals)](const auto&) mutable { return it != end && ++i > it->first && (i <= it->second || (++it, true)); })));

Live Example

You can accomplish the same thing in a for-loop though:

size_t write = 0U;
auto it = cbegin(intervals);

for (size_t read = 0U; read < size(values); ++read) {
    if (it == cend(intervals) || read < it->first) {
        values[write++] = values[read];
    } else if (read == it->second) {
        ++it;
    }
}

values.resize(write);

Live Example

If you are hooked on "the shortest amount of code necessary to achieve this," you can use my evil , from the lambda in the for-loop too:

for (size_t read = 0U; read < size(values); ++read) if (it == cend(intervals) || read < it->first || (read == it->second && (++it, false))) values[write++] = values[read];
Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • this solution seems broken to me. (1) `vector::erase` takes iterator arguments, not `int` indices; (2) after the first call to `vector::erase()` all iterators past the last erased one will be invalidated, breaking further calls of `vector::erase()` (including UB). – Walter Dec 12 '16 at 12:51
  • @Walter Good call. You were correct. I've updated my answer. – Jonathan Mee Dec 12 '16 at 14:43
1

The problem is non-trivial since after a first call to vector::erase() all indices/iterators to elements past the first erased one are invalidated, including further interval to be removed.

Therefore, using vector::erase() has to be done in descending order of the elements to be erased.

Another inconvenience originates from using int indices instead of iterators for the interval boundaries. Finally, vector::erase() copies (ore moves) all elements past the last removed elements to fill the gap. This preserves the order of values, but causes excessive copying (moving) in case of several intervals.

A more efficient way is to swap only the elements to be removed and finally down-size the vector.

Walter
  • 44,150
  • 20
  • 113
  • 196
  • Using a reverse iterator is still pretty trivial. – UKMonkey Dec 12 '16 at 12:58
  • Using a reverse_iterator is straightforward. 'Trivial' relates to the problem, not to the solution. Note also that if efficiency is of importance your solution is terrible. – Walter Dec 12 '16 at 14:46
1

What you surely want is a solution with not only short code but also good efficiency, minimizing the copies and shifts in the vector of values.

I would definitely go with the first part of your solution, that is falgging the positions to be kept or deleted.

std::vector<bool> flags(values.size(), true);
for (auto interval : intervals) {
    std:fill(flags.begin() + interval.first, flags.begin()+interval.second+1, false);
}

For the second part, the shortest and most efficient would be the erase/remove_if idiom:

 values.erase(std::remove_if(begin(values), end(values),
    [&](const auto& v) { return !flags[&v - &(*values.begin())];}),
  values.end());

The efficiency here is due to that remove_if will first mark the elements that need to be removed, then it will compact the vector by putting first the elements to stay and returning the position of the first element to remove. Finally, the erase will shrink the vector. From an algorithmic point of view, this solution is likely optimal. It should pay out for large vectors.

A.S.H
  • 29,101
  • 5
  • 23
  • 50
1

Well, the answers so far are all bad -- either making whole new vectors or requiring O(N^2) time -- so I'll add this one.

Instead of erasing the elements you don't want to keep, and shifting the rest every time, you move the ones you do want to keep into the proper position, and then just truncate the vector.

O(N) time and no extra space:

void removeIntervals(std::vector<double>& values, const std::vector < std::pair<int, int>>& intervals) {
    if (intervals.size()<=0)
        return;

    //keep the part before the first interval
    auto dest = values.begin()+intervals[0].first;

    for (size_t i=0; i<intervals.size(); ++i) {

        //copy the part to keep after each interval
        auto s = values.cbegin()+intervals[i].second+1;
        auto e = (i+i >= intervals.size() ?
                  values.cend() : 
                  values.cbegin()+intervals[i+1].first);
        while(s<e) {
            *dest++=*s++;
        }
    }
    values.erase(dest,values.end());
 }
Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
1

In complement of Matt Timmermans answer : It's not the question, but if you want to keep only values in intervals, in C++17, you can write :

void remove_if_not_in_interval(std::vector<double>& result, const std::vector<std::pair<int,int> >& intervals)
    {
      if (intervals.size() == 0)
        result.clear();

      auto dest = result.begin();
      for (auto [first, last] : intervals)
        {
          while(first!=last+1)
            {
              *dest++ = *(result.begin() + first++);
            }
        }

      result.erase(dest,result.end());
    }
Kafka
  • 720
  • 6
  • 21