1

I am writing a function to take the intersection of two sorted vector<size_t>s named a and b. The function iterates through both vectors removing anything from a that is not also in b so that whatever remains in a is the intersection of the two. Code here:

void intersect(vector<size_t> &a, vector<size_t> &b) {
  vector<size_t>::iterator aItr = a.begin();
  vector<size_t>::iterator bItr = b.begin();
  vector<size_t>::iterator aEnd = a.end();
  vector<size_t>::iterator bEnd = b.end();

  while(aItr != aEnd) {
    while(*bItr < *aItr) {
      bItr++;
      if(bItr == bEnd) {
        a.erase(aItr, aEnd);
        return;
      }
    }
    if (*aItr == *bItr) aItr++;
    else aItr = a.erase(aItr, aItr+1);
  }
}

I am getting a very bug. I am stepping the debugger and once it passes line 8 "while(*bItr < *aItr)" b seems to disappear. The debugger seems not to know that b even exists! When b comes back into existence after it goes back to the top of the loop it has now taken on the values of a!

This is the kind of behavior that I expect to see in dynamic memory error, but as you can see I am not managing any dynamic memory here. I am super confused and could really use some help.

Thanks in advance!

Mohammad Kanan
  • 4,452
  • 10
  • 23
  • 47
  • 4
    Sidenote: if you don't have a good reason for having to do all this yourself, there are [far easier ways](https://stackoverflow.com/questions/19483663/vector-intersection-in-c) :) – scohe001 Jun 05 '18 at 23:25
  • I would use std::set_intersection except that writes it into a new vector which I assume wouldn't be as efficient. – Ari Levisohn Jun 05 '18 at 23:27
  • 2
    On the contrary. Your algorithm is quadratic (because you try to delete from the middle of a vector, I'm not sure that you are even allowed to do it via an iterator in a loop) while the stock version would be linear in complexity. In general, prefer actual measurements to assumptions. Assumptions are wrong way too often. – KT. Jun 05 '18 at 23:33
  • @einpoklum What about "a.erase(aItr, aItr+1)" then? – KT. Jun 05 '18 at 23:35
  • @KT.: Oh yeah, [you're right](https://stackoverflow.com/a/37587453/1593077). – einpoklum Jun 05 '18 at 23:39
  • 1
    @AriLevisohn, perhaps you are compiling with optimizations which happen to remove the variable b completely from the code in question (as it is never used after `begin()` and the corresponding register is simply taken by other values). Try removing any `-O` compilation flags before debugging. – KT. Jun 05 '18 at 23:44
  • 1
    That behaviour is pretty normal when debugging optimized code - do you have optimization turned on? – user253751 Jun 06 '18 at 00:12
  • Your code do many assumptions that might not always be true… For example, if `b` is initially empty, the above code is incorrect. Also, you seems to ignore iterators invalidation rules. In particular, `end` iterator would become invalid when item are erased as the container is smaller. – Phil1970 Jun 06 '18 at 00:12

2 Answers2

3

Well, perhaps you should first address a major issue with your code: iterator invalidation.

See: Iterator invalidation rules here on StackOverflow.

When you erase an element in a vector, iterators into that vector at the point of deletion and further on are not guaranteed to be valid. Your code, though, assumes such validity for aEnd (thanks @SidS).

I would guess either this is the reason for what you're seeing, or maybe it's your compiler optimization flags which can change the execution flow, the lifetimes of variables which are not necessary, etc.

Plus, as @KT. notes, your erases can be really expensive, making your algorithm potentially quadratic-time in the length of a.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
1

You are making the assumption that b contains at least one element. To address that, you can add this prior to your first loop :

if (bItr == bEnd)
{
    a.clear();
    return;
}

Also, since you're erasing elements from a, aEnd will become invalid. Replace every use of aEnd with a.end().


std::set_intersection could do all of this for you :

void intersect(vector<size_t> &a, const vector<size_t> &b)
{
    auto it = set_intersection(a.begin(), a.end(), b.begin(), b.end(), a.begin());
    a.erase(it, a.end());
}
Sid S
  • 6,037
  • 2
  • 18
  • 24