4

Is there a fast way to find all the single elements (only appeared once) in a vector of elements? All the elements in the vector is either single or dual (appeared twice). My answer is sort all the elements and then remove double appeared elements. Any faster way to do it?

Alecto Irene Perez
  • 10,321
  • 23
  • 46
user1899020
  • 13,167
  • 21
  • 79
  • 154
  • 3
    yes if allowed extra space. we may build map with `key as single elements and occurrence as value` and then traversing map to find single elements whose value should be 1 in your case. – Yaman Jain May 08 '19 at 04:49
  • 3
    So you want something like a set not a vector? –  May 08 '19 at 04:54
  • 1
    What type are these elements? int, string or user-defined types? – P.W May 08 '19 at 04:56
  • 2
    What is expected size of your input? What data type your vector holds? Do you have any memory constraints?. Answer depends all of these, especially first one. – unlut May 08 '19 at 06:01
  • What is the range of input number? if it is small let's say you want to find unique numbers from array where each number is between `1 to 100` then you can create another array and at the index position you can assign the flag eg. `seen[ arr [i] ] = 1` then traverse the `seen` array from `1 to 100` and then print all element `if seen[ i ] is 1` – Mayur May 08 '19 at 07:51

2 Answers2

4

So for small enough n (<=1e8) sorting and removal (using std::sort() and std::unique) approach is still faster than hash tables.

Sample code: O(n log n)

vector<int>A = {1,2,3,1,2,5};
    sort(A.begin(),A.end());
    A.erase(unique(A.begin(),A.end()),A.end());
    for(int&x:A)
        cout<<x<<" ";
Photon
  • 2,717
  • 1
  • 18
  • 22
  • 1
    Is it though? There is some inflection point where using a hash table is strictly better than using a sort in terms of time. By definition, the best performance depends on the size of the input (as well as the hash and sort algorithms being used). – Joel Cornett May 08 '19 at 05:28
  • @JoelCornett yes, but in practice we probably wont reach that point, doing this with 1e7 elements this approach is still faster than hash table, for 1e8 elements hash table already requires too much memory to fit in my RAM. – Photon May 08 '19 at 05:30
  • 3
    use a bloom filter ;) – Joel Cornett May 08 '19 at 05:32
  • How about with 1e9 elements? What if the element size is large? Neither would have enough memory. In order for your answer to be correct, you need to qualify it with *for small enough n*... – Joel Cornett May 08 '19 at 05:36
  • @JoelCornett thanks for your feedback, updated my answer – Photon May 08 '19 at 05:45
  • no problem. One more nitpick, you cannot even say that 1e8 is categorically ok. It’s “probably” ok in practice, but it depends on the size of the elements, as well as page size and memory latency. All of those things can vary quite a bit from system to system (embedded vs server, for example). – Joel Cornett May 08 '19 at 05:55
  • The OP does not specify any of the above parameters – Joel Cornett May 08 '19 at 05:57
  • 1
    Exactly, while it is good to point out limitations, avoid imposing constraints or hypothetical limits not asked for in the original question. – David C. Rankin May 08 '19 at 05:59
  • @JoelCornett Thanks for commenting `bloom filter` <3 never knew about this :) – Mayur May 08 '19 at 08:00
1

if your elements are hashable, you can use a std::unordered_map<T, int> to store the count of each element, which will take amortized linear time:

template<typename T>
std::vector<T> uniqueElements(const std::vector<T>& v) {
    std::unordered_map<T, int> counts;
    for(const auto& elem : v) ++counts[elem];
    std::vector<T> result;
    for(auto [elem, count] : counts)
        if(count == 1)
            result.push_back(elem);
    return result;
}

For small lists, sorting and then doing a linear pass might still be faster. Also note that this copies your elements, which might also be expensive in some cases

Chemistree
  • 432
  • 6
  • 11