2

I am trying to remove duplications in a vector of struct and this duplication is determined by qieid in the struct:

struct infotable
{
  int qieid;
  int fid;
  int valid;
};

And I write a subfunction to do this:

void erase_duplicate(vector<infotable> &info)
{
  vector<infotable>::iterator it0,it1;

  for(it0 = info.begin();it0 != info.end()-1; it0++)
  {
    //it1 = find(qieidarray1.begin(),it0,*it0);

    for(it1 = it0 +1;it1 != info.end(); it1++)
    {
      if((*it1).qieid == (*it0).qieid)
        it1 = info.erase(it1);
    }
  }
}

But I have some problems with this piece of code. When the number of struct in the vector is small, it works fine. But when I have about more than 3000 structs in the vector, the program will go wrong and I have:

segmentation fault 11

showed on my screen. I know it is a memory access problem and I may access to somewhere I should not touch because I have so many elements in the vector. But is there anyway I can improve my code in order to get a better performance(run more elements)?

lethal-guitar
  • 4,438
  • 1
  • 20
  • 40
Hua Wei
  • 53
  • 6
  • 1
    Have you considered using a set instead of a vector? You will need to write a comparison to tell the set that two structs with the same qieid are equal. – Katu May 22 '14 at 07:07
  • @katutxakurra That is a good idea! I have never used set before. Thx! – Hua Wei May 22 '14 at 07:20

3 Answers3

4

when you use erase, it will invalidate iterators.

for(it1 = it0 +1;it1 != info.end(); )
{
   if((*it1).qieid == (*it0).qieid)
      it1 = info.erase(it1);
   else
      it1++;
}
tofi9
  • 5,775
  • 4
  • 29
  • 50
Jerry YY Rain
  • 4,134
  • 7
  • 35
  • 52
4

You could also use <algorithm> functions to do it, std::unique in particular.

First, sort by qieid, then use unique, which moves the "deleted" elements at the end and returns a new iterator, which you can then use to actually erase these elements.

std::sort(info.begin(), info.end(), [](const infotable& a, const infotable& b) {
    return a.qieid < b.qieid;
});
auto newIt = std::unique(info.begin(), info.end(), [](const infotable& a, const infotable& b) {
    return a.qieid == b.qieid;
});
info.erase(newIt, info.end());

This will run in O(n+log n), whereas your original solution has O(n2) time complexity. Personally, I'd prefer this solution especially because it's more expressive and so it's easier to see what the code is doing.

Note: If your compiler already supports C++ 14's generic lambdas, you can simplify the expressions even more by using: [](const auto& a, const auto& b) { ... }

lethal-guitar
  • 4,438
  • 1
  • 20
  • 40
2

you can remove duplicate with the std::set trick, it's quite readable, and easily maintanable, but is less efficient.

std::vector<int> r = {14,26,58,56,26,14};

std::set<int> s(r.begin(),r.end());
r = std::vector<int>(s.begin(),s.end());

You'll need to give the std::set your own comparison function in order for it to work with your structure.

Kiroxas
  • 891
  • 10
  • 24