15

This question with an added constraint.

I'm willing to allow not-uniform selection as long as it's not to lop sided.

Given that "sets are typically implemented as binary search trees" and I expect they will contain some kind of depth or size information for balancing, I would expect you could do some sort of weighted random walk of the tree. However I don't know of any remotely portable way to do that.

Edit: The constraint is NOT for the amortized time.

Community
  • 1
  • 1
BCS
  • 75,627
  • 68
  • 187
  • 294
  • 1
    Now that's an interesting question, but *I* would implement it as a feature of the balanced tree, which is tough to do with the library implementation. – dmckee --- ex-moderator kitten Nov 28 '11 at 21:03
  • 4
    `std::set` is not defined to be a bstree. Its complexity requirements essentially mean that it can't be anything else, but the tree structure is not part of the standard, and hence not part of the interface, either. (If you had an actual balanced tree, you could pick a random element in O(log n) by picking the left or right child randomly until you're at the bottom.) Maybe a `random()` interface should be proposed for the next standard; after all, there's already a `random_shuffle` algorithm and this is no different. (By the way, you can do it in O(1) on an `std::unordered_set`.) – Kerrek SB Nov 28 '11 at 21:04
  • 1
    @KerrekSB: Perhaps pick left, right, or stop, so at least each element gets a chance. – GManNickG Nov 28 '11 at 21:13
  • @GMan: Yes, of course, thanks! The probabilities have to be adjusted accordingly. – Kerrek SB Nov 28 '11 at 21:16
  • Looks like I'll have to wait for the `random()` interface in C++2x. – BCS Nov 28 '11 at 21:36

5 Answers5

7

Introduce array with size equal to set. Make array elements hold addresses of every element in set. Generate random integer R bounded by array/set size, pick address in array's element indexed by R and dereference it to obtain set's element.

Victor Sorokin
  • 11,878
  • 2
  • 35
  • 51
  • 8
    And regenerate this array every time the set changes. – Violet Giraffe Nov 28 '11 at 21:12
  • 3
    True. Nowhere in OP post is said that set ever chages. Also, nowhere it's said there are any memory constraints :) – Victor Sorokin Nov 28 '11 at 21:13
  • 1
    That's also true :) Your solution looks like the only one possible through the set's public interface. – Violet Giraffe Nov 28 '11 at 21:15
  • ... also, nowhere in the post did he mention that you are not an idiot... make deductions from what he said. he wants to search in the std::set – Karoly Horvath Nov 28 '11 at 21:19
  • 2
    Hm, somehow that feels like cheating... "copy the set into a different data structure and then do something else"... I mean, it works, but it feels like it's missing the point. – Kerrek SB Nov 28 '11 at 21:25
  • Almost true -- we cache not set's elements themselves, but addresses of set's elements. This is more efficient, memory-wise. However, I must admit this answer is trivial :) – Victor Sorokin Nov 28 '11 at 21:28
  • 2
    I'm looking for a solution that remains valid regardless of how often the set is modified. – BCS Nov 28 '11 at 21:34
  • @Kerrek SB: I would say It doesn't even work.. what happens if you want to remove an element? – Karoly Horvath Nov 28 '11 at 21:39
  • @VioletGiraffe You can avoid this problem with the solution I give here : http://stackoverflow.com/a/31523081/2312686. It's O(log n) insertion and deletion while keeping O(1) random retrieval (with a O(n) space overhead). – matovitch Jul 22 '15 at 11:25
3

I don't see how to do it with just std::set, so you probably need a different data structure. Like Victor Sorokin said, you can combine a set with a vector. Instead of set<T>, use map<T, size_t>, plus vector< map<T, size_t>::iterator >. The value for each key is an index into the vector, and each element of the vector points back to the map element. The vector elements have no particular order. When you add an element, put it at the end of the vector. When you remove an element and it's not the last one in the vector, move the last element to the deleted element's position.

Derek Ledbetter
  • 4,675
  • 3
  • 20
  • 18
1

IF you know the distribution of the elements in the set, you can randomly select key (with that same distribution) and use std::set::lower_bound. That's a lot of if though.

int main() {
    std::set<float> container;
    for(float i=0; i<100; i += .01)  
        container.insert(i);
    //evenish distribution of 10000 floats between 0 and 100.
    float key = std::rand() *10000f / RAND_MAX; //not random, sue me
    std::set<float>::iterator iter = container.lower_bound(key); //log(n)
    std::cout << *iter;
    return 0;
}
Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
  • 1
    @BCS: I don't think it can be done in logN with the standard set other than this, you'll have to roll your own. – Mooing Duck Nov 28 '11 at 23:13
0

For std::unordered_set<int> s:

1) take random R in min(s)..max(s)

2) if R in s: return R

3)

newIter = s.insert(R).first;
newIter++;
if (newIter == s.end()) {
    newIter = s.begin();
}
auto result = *newIter;
s.erase(R);
return result;

For ordered set (std::set) probability would depend on distance between elements. unordered_set is randomized by hash.

I hope this can help.

PS converting std::set<V> into std::set<std::pair<int, V>> (where first element in pair is a hash of second) makes this method suitable for any hashable V.

Aleksandr Pakhomov
  • 753
  • 1
  • 5
  • 13
0

You may be able to make a randomly-ordered copy of the map by using this constructor

template <class InputIterator>
set(InputIterator f, InputIterator l,
    const key_compare& comp)

..and passing a comparator that compares hashes of the keys (or some other deterministic spreading function.) Then take the "smallest" keys according to this new map.

You could construct the map once and amortize the cost across several requests for a "random" element.

phs
  • 10,687
  • 4
  • 58
  • 84