1

If I have a vector of vector let's call it:

vector<vector<int> > data;

and in data it has numbers like

0 1
0 3
0 4
1 0
1 2
1 5
3 0

how could I get rid of the data that is a reverse of itself? For example: 0 1 and 1 0 and I would like to get rid of 1 0 because I already saw 0 1. Another example: 0 3 and 3 0 and I would like to get rid of 3 0 because I already saw 0 3.

So the data would instead be this:

0 1
0 3
0 4
1 2
1 5

What would be the easiest way to do this?

Mdjon26
  • 2,145
  • 4
  • 19
  • 29

4 Answers4

1

Since you probably want to print out the values without their opposites, you could do this:

for each pair:
    if it exists in the HashMap:
       do nothing
    else
        add the opposite to a HashMap 
        print the pair
Bartlomiej Lewandowski
  • 10,771
  • 14
  • 44
  • 75
1

If you can afford to use a lot of memory, and the maximum size of the integers is small like in your example, I would simply create a bit-vector big enough to hold the entire search space. Compute an index into this bit vector from both input numbers.

int N_POSSIBLE_PAIRS = (1 << MAX_BITS) * (1 << MAX_BITS);

// vector<bool> is specialized - it only uses 1 bit per entry
std::vector<bool> bitset(N_POSSIBLE_PAIRS);

int index = (first << MAX_BITS) | second;

// in a loop,
if (bitset[index]) {
    // duplicate
}
else {
    int reverse_index = (second << MAX_BITS) | first;
    bitset[index] = true;
    bitset[reverse_index] = true;
}

This actually wastes 2x space - you could fix that with a more complex indexing scheme if necessary.

If the maximum size of the integers is too big, or you are restricted in memory, or you just prefer to be frugal with memory, then I would sort the pairs lexicographically and use binary search to check for duplicates.

It's also possible that my suggestion will perform poorly on sparse data because it is not cache-friendly.

japreiss
  • 11,111
  • 2
  • 40
  • 77
1

You can push the vectors in a set and check if the reverse is already in the set. Something like this:

C++11 version:

#include <iostream>
#include <vector>
#include <algorithm>
#include <set>

using namespace std;

vector<int> myrev(vector<int>& f)
{
  vector<int> s;
  s.push_back(f[1]);
  s.push_back(f[0]);
  return s;
}
int main()
{
  vector<vector<int> > data={{0,1},{0,3},{0,4},{1,0},{1,2},{1,5},{3,0},{1,0}};
  set<vector<int> > unique_data;
  for(auto& x: data)
  {
    if(unique_data.find(myrev(x))==unique_data.end())
      unique_data.insert(x);
  }
  for(auto& x: unique_data)
  {
    cout << x[0] << ":" << x[1] << endl;
  }
  return 0;
}

C++98 version:

#include <iostream>
#include <vector>
#include <algorithm>
#include <set>

using namespace std;

vector<int> myrev(vector<int>& f)
{
  vector<int> s;
  s.push_back(f[1]);
  s.push_back(f[0]);
  return s;
}
int main()
{
  vector<vector<int> > data;
  //lame C++98 initialization of the vector
  vector<int> tmp(2);
  tmp[0]=0;tmp[1]=1;
  data.push_back(tmp);
  tmp[0]=0;tmp[1]=3;
  data.push_back(tmp);
  tmp[0]=0;tmp[1]=4;
  data.push_back(tmp);
  tmp[0]=1;tmp[1]=0;
  data.push_back(tmp);
  tmp[0]=1;tmp[1]=2;
  data.push_back(tmp);
  tmp[0]=1;tmp[1]=5;
  data.push_back(tmp);
  tmp[0]=3;tmp[1]=0;
  data.push_back(tmp);

  set<vector<int> > unique_data;
  for(vector<vector<int> >::iterator x=data.begin(); x!=data.end(); x++)
  {
    if(unique_data.find(myrev(*x))==unique_data.end())
      unique_data.insert(*x);
  }
  for(set<vector<int> >::iterator x=unique_data.begin(); x!=unique_data.end(); x++)
  {
    cout << (*x)[0] << ":" << (*x)[1] << endl;
  }
  return 0;
}
Johnny Mnemonic
  • 3,822
  • 5
  • 21
  • 33
0

Try this (assuming there are no negative numbers in your list, and for space reasons, the numbers are not too large):

1) Create a bitset 2d grid with size MxM (where M is the maximum number to expect). Set each bit in the grid to 0

2) For each pair of numbers (x,y):

check if grid(x,y) is 1.  If yes then you have a duplicate
else
check if grid(y,x) is 1.  If yes then you have a duplicate.
set grid(x,y) and grid(y,x) to 1
PaulMcKenzie
  • 34,698
  • 4
  • 24
  • 45
  • But then (0, 2) and (1, 0) will make it seem like (0, 0) has been seen. – japreiss Apr 01 '14 at 18:03
  • Changed the algo to use a grid of bits. Probably still not the best, but it more than likely works. Looks like your answer though, so you beat me to it. Give you a +1. – PaulMcKenzie Apr 01 '14 at 18:28