0

I've written a simple function that removes elements from a vector (V2), based upon the values of another vector (V1):

std::vector<int> V1={6,2,3,4};
std::vector<int> V2={9,4,8,6,7};

for(int i=0; i<V1.size(); i++)
{
   if(!V2.empty())
   {
     V2.erase(std::remove(V2.begin(),V2.end(),V1[i]),V2.end());
   }
}

My challenge is that the above needs to be O(n) complexity. Currently this is O(n*m), n being V1, m being V2.

N.B. Arrays are not and cannot be sorted as the elements original index values are required.

Questions:

  1. Am I right in saying 'V2.erase' is stopping this function from being O(n)? (Because its a nested iteration within the for loop).

  2. Is there a way around this, by performing the erase operation outside the loop?

Babra Cunningham
  • 2,949
  • 1
  • 23
  • 50
  • 3
    If arrays are sorted, `std::set_difference` seems to be what you want. – Jarod42 Oct 18 '16 at 07:47
  • There's no way to get `O(n)` with your current presentation. – Hatted Rooster Oct 18 '16 at 07:50
  • Thanks @GillBates. Do you have any suggestions of how to get O(n) for this operation? – Babra Cunningham Oct 18 '16 at 07:51
  • @BabraCunningham: Are arrays sorted ? – Jarod42 Oct 18 '16 at 07:52
  • Add V1 to an unordered_set. Iterate over V2 check if the values are in the set and if so dont add them. The lookup is O(n). Iterate over the set and add the values to another vector. The space complexits increases but you have your O(n) – Samer Tufail Oct 18 '16 at 07:53
  • @SamerTufail `Add V1 to an unordered_set.` This itself can be worst case O(n^2). See the [documentation](http://www.cplusplus.com/reference/unordered_set/unordered_set/insert/) – Rohith R Oct 18 '16 at 07:55
  • @PRP it would be single element insertions so im not sure how this goes to O(n^2) – Samer Tufail Oct 18 '16 at 07:57
  • @SamerTufail `Single element insertions: Average case: constant. Worst case: linear in container size.` – Rohith R Oct 18 '16 at 07:58
  • @BabraCunningham When an element is deleted how do you expect the remaining elements to have their original indices ? Do you have a special value like `-1` that we can use to indicate that the element in that index in invalid ? – Rohith R Oct 18 '16 at 07:59
  • @PRP yes, i'm going to insert -1 to replace the deleted element as to not change the index values, perhaps std::swap can work? – Babra Cunningham Oct 18 '16 at 08:02
  • @PRP linear case when collisions occur, with a good enough hash function I dont see this as a problem – Samer Tufail Oct 18 '16 at 08:02
  • 1
    @SamerTufail When you talk about complexity you always assume worst case. It is mathematical. Obviously practically you may be correct. But the solution that the OP asked is regarding complexity analysis. – Rohith R Oct 18 '16 at 08:04
  • @PRP :) I agree you look at worst case what is important here is the dataset at hand as well. For what the is required I think it should be fine. Anyway before this turns into a 'whats the best container' argument here is some good info on the unordered containers http://bannalia.blogspot.co.uk/2013/10/insertion-complexity-of-c-unordered.html - – Samer Tufail Oct 18 '16 at 08:06
  • 1
    @SamerTufail yea, we can't comment on the exact complexity without looking at the exact problem statement. May be numbers are pretty small to use array as a hash table. **Babra Cunningham Please post the exact problem statement, or throw some light on the constraints of the elements of the vector.** – Rohith R Oct 18 '16 at 08:10

2 Answers2

2

Why not use std::set_difference:

std::vector<int> test(
    std::vector<int> v1,
    std::vector<int>& v2)
{
    // The algorithm we use requires the ranges to be sorted:
    std::sort (v1.begin(), v1.end());
    std::sort (v2.begin(), v2.end());

    // our output vector: reserve space to avoid copying:
    std::vector<int> v3;
    v3.reserve (v2.size());

    // Use std::set_difference to copy the elements from
    // v2 that are not in v1 into v3:
    std::set_difference (
        v2.begin(), v2.end(),
        v1.begin(), v1.end(),
        std::back_inserter(v3));

    return v3;
}

If v1.size() == n and v2.size() == m the runtime of this works out to roughly:

O(NlogN) + O(MlogM) + 2 O(M+N-1)


OK, so then how about this:

void test2(
    std::vector<int> v1,
    std::vector<int> v2)
{
    // We can still sort this without affecting the indices
    // in v2:
    std::sort (v1.begin(), v1.end());

    // Replace all the elements in v1 which appear in v2
    // with -1:
    std::replace_if (v2.begin(), v2.end(),
        [&v1] (int v)
        {
            return std::binary_search(v1.begin(), v1.end(), v);
        }, -1);
}

Not linear; estimating the complexity is left as an exercise for the OP.


A third alternative is this:

void test3(
    std::vector<int> v1,
    std::vector<int>& v2)
{
    // We can still sort this without affecting the indices
    // in v2:
    std::sort (v1.begin(), v1.end());

    auto ret = std::stable_partition (
        v2.begin(), v2.end(),
        [&v1] (int v)
        {
            return !std::binary_search(v1.begin(), v1.end(), v);
        });

    v2.erase (ret, v2.end());
}

Again, not linear, but options...

Nik Bougalis
  • 10,495
  • 1
  • 21
  • 37
  • I don't see how you can keep the "original index values" if you're deleting stuff. Can we see an _actual_ statement of the problem maybe? – Nik Bougalis Oct 18 '16 at 08:02
  • @NikBougalis Call it "stable remove" - the original *order* in which the remaining (post-removal) elements is preserved (similar with [stable sort](http://stackoverflow.com/questions/1517793/stability-in-sorting-algorithms)) – Adrian Colomitchi Oct 18 '16 at 08:13
1
std::vector<int> V1={6,2,3,4};
std::vector<int> V2={9,4,8,6,7};

// must be a truly pathological case to have lookups of O(N)
std::unordered_set v1_hashed(v1.begin(), v1.end());

for(
  size_t v2ix=v2.size()-1; 
  v2ix<v2.size(); // otherwise underflow past zero
  v2ix--
) {
  // generally, an O(1) is assumed here
  if(v1_hashed.find(v2[v2ix]) != v1_hashed.end()) {
    // removal of element will cost quite significant due to 
    // moving elements down. This is why doing the cycle in reverse
    // (less elements to move down).
    v2.erase(v2ix);
  }
}

// O(N) searches + O(M) hashing of v1
Adrian Colomitchi
  • 3,974
  • 1
  • 14
  • 23