1

I have a type called Neighbors:

typedef vector<pair<data,int>> Neighbors;

and here's data:

struct data
{
    int par[PARAMETERS];
    int cluster;
    bool visited;
    bool noise;
};

I'm trying to write a function that inserts values from _NeighborPts to NeighborPts (but only ones that aren't already in NeighborPts):

void insert_unique(Neighbors* NeighborPts, const Neighbors& _NeighborPts)
{
    Neighbors_const_it _it = _NeighborPts.begin();
    while(_it != _NeighborPts.end())
    {
        if(/* _it->first.par isn't in *NeighborPts */)
            NeighborPts->push_back(*_it);
        ++_it;
    }
}

and i already have a function equal() which checks if 2 pars are equal.

So do i have to iterate through NeighborPts in a while loop and check if the item is found? or could i use some built-in find or find_if function to do that for me?

Alaa M.
  • 4,961
  • 10
  • 54
  • 95
  • 4
    Why not use `std::set<>` if uniqueness is important? – johnsyweb May 24 '14 at 12:08
  • @Johnsyweb i need them to be unique only by parameter `par` (which is inside `struct data`). i think `set<>` allows uniqueness just for `first` or `second`, am i right? – Alaa M. May 24 '14 at 12:11
  • 4
    You can pass custom comparision function to std::set. – Mohit Jain May 24 '14 at 12:12
  • 3
    You must not use `_NeighborPts` as an identifier, since it's a reserved name. – Kerrek SB May 24 '14 at 12:22
  • @KerrekSB is it? i haven't got any errors/warning for this yet. i used it in all over the program. – Alaa M. May 24 '14 at 12:41
  • 6
    @AlaaM.: "In basketball you cannot hold the ball." - "But look, ma, I'm holding the ball!" – Kerrek SB May 24 '14 at 13:01
  • @AlaaM. Kerrek is correct, see http://stackoverflow.com/questions/228783/what-are-the-rules-about-using-an-underscore-in-a-c-identifier you just had luck so far. :) – Baum mit Augen May 24 '14 at 14:19
  • @MohitJain `set<>` doesn't allow `[]` operator... and i need it. – Alaa M. May 24 '14 at 18:53
  • `operator []` is anyways not important because after inserting values `4 3 4 7 4 5`, you can not expect the index of 5. Still if you are keen with `operator []` as well as uniqueness, you can use `std::find` to make sure whether to insert or not. If you are concerned about efficiency, you can insert something like [treap](http://en.wikipedia.org/wiki/Treap) in which data uniqueness can be checked in `log n` time (BST) and indexes can be maintained as heap and are thus indexable in `log n` time. – Mohit Jain May 24 '14 at 19:06
  • @MohitJain i do need `[]`... can't briefly explain why. and search results show no treap in stl. – Alaa M. May 24 '14 at 19:25
  • stl does not have treap. You need to implement tree class (templatized or specialized for your need) by yourself. Or if performance is not a big concern, you can use `std::find`. If most of the time you insert into vector, and rarely use `operator []`, you can push data in `std::map` as key with value as index (map.size) and later iterate through map to resolve the index. – Mohit Jain May 24 '14 at 19:30
  • @Alaa M. see my revised answer – Daniel Heilper May 24 '14 at 21:20

3 Answers3

1

You can maintain a sorted vector. Use the lower_bound function from C++ algorithms to locate the insert position each time. If the element at the insert position is equal to the insert element then you have a duplicate.

The performance of this will be pretty good unless the vector grows too large. The point at which you're better off using a set or a unordered_set varies and you'd need to benchmark to find it.

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
0

Your current solution with vector will run in O(N^2) time, which is not efficient. For efficient solution an associative container will be great - such as std::set . Also you will need to have some "operator less" (instead of "equal ()"), to pass to the function.

template < class T,                        // set::key_type/value_type
           class Compare = less<T>,        // set::key_compare/value_compare
           class Alloc = allocator<T>      // set::allocator_type
           > class set;

So you need to provide compare class

struct data_compare {
    bool operator() (const data& lhs, const data& rhs) const{
      //...
    }
};

set<int64_t, data_compare> exising_items;

You may define such a function, or override "operator <" in struct data.

insert all "data" from "_NeighborPts" into a set - O(N*log(N)) time

std::set other_items; in a loop - iterate _NeighborPts and insert data elements

other_items.insert (_NeighborPts [i]);

std::set my_items; in a loop - iterate _NeighborPts and insert data elements

my_items.insert (NeighborPts [i]);

Now you need to compare between the 2 sets: You can do it using std::set_intersection . or construct a simple loop on the set "my_items" if the current element in other_items isn't in _my_items, insert it in "NeighborPts"

this solution will run in O(Nlog(N)) time

Daniel Heilper
  • 1,182
  • 2
  • 17
  • 34
  • Not that you are wrong, but I am not sure that vector should be ruled out in all cases. For one thing the order of insertion can be preserved and elements seem in this case to be added to the end. Maybe insertion is not the most costly operation on the container? – user2672165 May 24 '14 at 17:19
  • @user2672165 for searching, vector can't do the job. And you don't need to preserve the order in the reference vector. insertion of 1 element in the set is O(logN). So the whole solution runs in O(NlogN) instead of O(N^2) in the OP's. – Daniel Heilper May 24 '14 at 17:28
  • @hellfire769 i'm actually using `vector<>` because i need the `[]` operator... `set<>` doesn't preserve the order of insertion. Anyway, what's the time complexity for finding an element in `set<>`? is it O(logN) or O(N)? – Alaa M. May 24 '14 at 18:57
  • 1
    @Alaa M. set is only to read "_NeighborPts" (the const reference), not "NeighborPts". "NeighborPts" order is preserved. The complexity of finding an element inset is O(logN). So if you do it N times, you'll get O(NlogN) – Daniel Heilper May 24 '14 at 21:03
  • @Alaa Also, "_NeighborPts" is const reference so it shouldn't be changed anyway.in the second loop you iterate on "NeighborPts" and find if the current element is found in set. Anyway my solution isn't correct, I will revisit it – Daniel Heilper May 24 '14 at 21:11
0
  1. There is no getting around iterating over the items in _NeighborPts.
  2. As long as you are using an std::vector, there is no getting around the check to determine whether an item is in NeighborPts before inserting in it.

You can make the code a little bit easier to read by using std::for_each and a functor.

struct UniqueItemInserter
{
   UniqueItemInserter(Neighbors* neighborsIn) : neighbors(neighborsIn) {}

   void operator(pair<data,int> const& item)
   {
     if ( std::find(neighbors->begin(), neighbors->end(), item) != neighbors->end() )
     {
        neighbors->push_back(item);
     }
   }

   Neighbors* neighbors;
};

void insert_unique(Neighbors* NeighborPts, const Neighbors& _NeighborPts)
{
    std::for_each(_NeighborPts.begin(), _NeighborPts.end(), UniqueItemInserter(NeighborPts));
}
R Sahu
  • 204,454
  • 14
  • 159
  • 270