2

Is there a way to randomly retrieve an element from C++ unordered_set in O(1) average time? Instead of doing

std::unordered_set<int> s;
// initialize s
auto start = s.begin();
for (int i = 0; i < rand()%s.size()-1; ++i, ++start) {}
int randomNumber = *start;

Updated:

I need to fight for the post, so I add my reasons for needing the functionality of above.

I am playing with implementing a maze generator. And somehow I need a data structure which would support:

  1. insertion / deletion in O(1)
  2. random retrieve an element from the data structure in O(1)

std::vector has random access, but insertion / deletion is expensive

std::list has no random access

std::set support O(logN) random access, and O(logN) insertion/ deletion, which are great, but my initialization is a sorted sequence which would easily break the balance of it.

So I thought hash table would be the best choice, however randomly retrieval an element would be nontrivial.

Thank you for your time.

  • 2
    What do you mean? This **is** O(1) expected (although it’s bad code because it uses `rand` wrongly, and `rand` should be considered deprecated in favour of the [`` header](http://en.cppreference.com/w/cpp/header/random), anyway). Furthermore, [“random access”](http://en.wikipedia.org/wiki/Random_access) usually means something else. – Konrad Rudolph Jan 20 '15 at 21:56
  • 1
    By "random access," do you mean "choose a random element of the set?" – templatetypedef Jan 20 '15 at 21:59
  • In this code, the worst case is that `rand()%s.size()` always returns a number higher than `i`. That means that you will access `n` elements, so this is obviously `O(n)`. This is probably not what you want, but it is what you'll get. – Rafael Lerm Jan 20 '15 at 22:00
  • @RafaelLerm Notice that it’s comparing to `i`, but incrementing `start`. I’m *assuming* that what OP actually *wanted* to do is something else, namely pick *n* random elements … – Konrad Rudolph Jan 20 '15 at 22:02
  • I saw that, but `i` is still the condition of the loop, so I think the worst case still holds. I still haven't decided what the OP wants. It's either pick n random elements, in which case the O(1) question is just silly, or he wants to get one random element. – Rafael Lerm Jan 20 '15 at 22:04
  • I should say random retrieval instead of random access –  Jan 20 '15 at 22:04
  • Why would you need a `std::unordered_set` for picking a random number from a set of unique numbers? – 101010 Jan 20 '15 at 22:05
  • @RafaelLerm Oops, I forgot ++i in the for loop –  Jan 20 '15 at 22:07
  • This is a duplicate of http://stackoverflow.com/questions/12761315/random-element-from-unordered-set-in-o1, and maybe http://stackoverflow.com/questions/12288486/how-to-efficiently-select-a-random-element-from-a-stdset. – Rafael Lerm Jan 20 '15 at 22:09
  • @RafaelLerm this is not a duplicate. The first reference does not provide a way to randomly retrieve an element from hash table, and the second is std::set, not std::unordered_set –  Jan 20 '15 at 22:16
  • @Vindicate You're right about the second one, but the first one seems the same to me. Even the answers so far are going the same way. – Rafael Lerm Jan 20 '15 at 22:27

3 Answers3

2

You can't pick a random element from an unordered_set in O(1) time. The iterators are ForwardIterators, not RandomAccessIterators. You would have to use a different container. Either boost::container::flat_set<int> or write your own that also has something like a vector internally:

template <typename T>
class set_with_random_access
{
    std::vector<T*> vec;
    std::unordered_set<T> set;
};

For which we provide functions that keep those in line, like insertion:

void insert(const T& value) {
    auto pr = set.insert(value);
    if (pr.second) {
        vec.push_back(&*pr.first);
    }
}

And random-ness:

template <typename GEN>
T& random(GEN& gen) {
    std::uniform_int_distribution<size_t> dist(0, vec.size() - 1);
    return *vec[dist(gen)];
}

Which is, frankly, a lot of work, so probably use the boost one.

Barry
  • 286,269
  • 29
  • 621
  • 977
  • If the `unordered_set`'s hash table rehashes, the pointers in the `vector` become invalid. This can be avoided if the number of elements is known. Otherwise, I believe a copy of the element needs to belong to both (or some "smart pointer"-like object). – jxh Jan 21 '15 at 17:03
2

a way to randomly retrieve an element from C++ unordered_set in O(1) average time?

Depends what counts as "random" for your purposes, and whether being a tiny smidge above O(1) is good enough. You can pick a random bucket "b" between 0 and s.bucket_count() - 1, repeating if the bucket's empty, then a list index li between 0 and s.bucket_size(b) - 1, then std::advance(s.begin(li)) to get an iterator to a "random" element, but, consider this situation:

You roll three dice - then randomly pick one of those: you get a random 1-6 value with even probability, but if you keep picking without rolling again you can only ever get whatever value(s) ended up on the three dice: the probabilities of each value from 1 through 6 are severely uneven.

The above approach to picking a random element in an unordered_set is a little like that: if there are x buckets with elements, then each bucket has an even chance of being selected, but the elements in that bucket have 1 / x / bucket_size() chance of selection which - for any given bucket - may be less or more than 1 / size(). In other words, if you consider the hashing effectively random, then the various elements have equal chance of being favoured or penalised in their placement, but that "skew" is then set it stone until the table data's significantly mutated or the table's rehashed (and if it's rehashed by say doubling table size rather than to a larger prime number (vague memory that unordered_set doubles), then once-penalised values will tend to remain penalised half the time).

The big-O efficiency of the above is a tiny smidge above O(1) because:

  • there's some repetition in the initial probe to find a bucket with elements, but with a load factor of 1.0 it's unlikely to need more than a couple attempts (given a good hash function); other options are available - like iterating from an empty bucket, or jumping by various displacements (modded into the table size) - which may perform a little better than trying another completely random bucket but may also exacerbate discrepancies in the odds of element selection

  • there's linear iteration in the elements colliding in any given bucket, but as the default load factor is 1.0 it'll be rare to have more than a couple collisions, and increasingly extremely rare to have many more than that.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
1

Picking a random element from a std::unordered_set is a bad idea. This is due to the fact that std::unordered_set doesn't support random access and thus doesn't have a subscript operator (i.e., operator[]).

I strongly believe that what you need is a std::vector in combine with std::unique in order to satisfy element uniqueness.

In the example below I use a std::vector and then I ensure that it has only unique elements by applying std::unique algorithm on it. Then I use the random utilities in order to generate a random index in [0, vector's size - 1]:

std::vector<int> v{1, 2, 8, 3, 5, 4, 5, 6, 7, 7, 9, 9, 19, 19};
v.erase(std::unique(v.begin(), v.end()), v.end());

std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(0, v.size() - 1);

std::cout << "Random number from vector: " << v[distribution(generator)] << std::endl;

LIVE DEMO

101010
  • 41,839
  • 11
  • 94
  • 168