C++ Randomly select non-empty element of std::vector >

Question

I am given a large vector that itself contains vectors of a specific data type e.g. std::vector<std::vector<double> > foo. I am trying to retrieve a random element foo[idx] from foo such that foo[idx] is non-empty or respectively foo[idx].empty() == false.

My naive guess would be to select random elements from foo until my constraint foo[idx].empty() == false is fulfilled.

However it is very likely the case that foo is very sparsely filled with non-empty vectors. Hence, my approach would quite likely be devastatingly slow.

Are there better approaches or should i think of a completely different data structure?

@iBug `std::shuffle` will not help the program just condense it. — Jake Freeman, Dec 20 '17 at 12:11
@JakeFreeman It's also fine to create an index array then shuffle the indices. — iBug, Dec 20 '17 at 12:13
The random approach is good, especially if you can cache the indexes you have already tried so you don't try them again. — Some programmer dude, Dec 20 '17 at 12:14

score 6 · Accepted Answer · answered Dec 20 '17 at 12:13

6

Maintain an auxiliary vector with the index of the non empty elements and get a random element from there

answered Dec 20 '17 at 12:13

Gerard

196
6

This idea is not that efficient because you are storing 2x memory and it takes time to constantly make sure they are the same – Jake Freeman Dec 20 '17 at 12:21
It takes constant time to maintain using a hash table. About the memory, yes, it takes memory. but memory is free :D – Gerard Dec 20 '17 at 12:23

score 2 · Answer 2 · answered Dec 20 '17 at 12:29

You could build an index of the nonempty elements:

std::vector<std::vector<double> > foo;
std::vector<decltype (foo)::iterator> nonempty;
for (auto it = foo.begin(); it != foo.end; ++it)
{
  if (! it->empty())
  {
    nonempty.push_back(it);
  }
}
std::random_device rd;
// random-number engine used (Mersenne-Twister in this case) 
std::mt19937 rng(rd());
// create a guaranteed unbiased index (unlike using modulo on rand)
std::uniform_int_distribution<size_t> uni_idx_dist(0,nonempty.size() - 1); 

auto &random_nonempty = *nonempty[uni_idx_dist(rng)];

score 0 · Answer 3 · answered Dec 20 '17 at 12:15

0

You can first extract an index of what's non-empty, then select one:

std::vector<int> ind;
for (int i = 0; i < foo.size(); i++){
    if (! foo[i].empty()) {
        ind.push_back(i);
    }
}
int i = rand() % int.size();
return int[i];

answered Dec 20 '17 at 12:15

iBug

35,554
7
89
134

score 0 · Answer 4 · answered Dec 20 '17 at 13:47

You could build a vector of references to non empty vectors.

#include <algorithm>
#include <functional>
#include <iterator>
#include <random>
#include <vector>
#include <iostream>

int main() {
    using int_vec_t = std::vector<int>;
    std::vector<int_vec_t> v = {
        {0, 1, 2}, {}, {}, {3, 4, 5},
        {}, {6, 7, 8}, {}, {}, {9}, {10, 11}
    };

    // You can't put reference direcly, so use reference_wrapper instead
    std::vector<std::reference_wrapper<int_vec_t> > nonempty;
    nonempty.reserve(v.size());
    // "copy" non empty vectors. (Doesn't do copy, actually)
    std::copy_if(v.begin(), v.end(), std::back_inserter(nonempty), [](const int_vec_t& v) { return !v.empty();});
    if (nonempty.empty())
        return 0;
    // pick an element
    static std::random_device rd;
    static std::mt19937 gen(rd());
    std::uniform_int_distribution<> dis(0, nonempty.size() - 1);
    const int_vec_t& result = nonempty[dis(gen)];
    // dump result
    std::copy(result.begin(), result.end(), std::ostream_iterator<int>(std::cout, ", "));

    return 0;
}

score 0 · Answer 5 · answered Dec 20 '17 at 14:14

There are a couple problems here:

The ordinal vector is sparsely populated
A vector of vectors is wateful

To solve problem 1 I'd recomend the use of a map<size_t, vector<double>> foo this will allow you to use non-linear indices, but it won't require the population of the intervening empty vectors. Selection of a random populated element here just involves advancing the iterator to point to the appropriate element. For example result will be the const-iterator to a random key-value pair in foo:

const auto idx = foo.empty() ? 0U : std::mt19937{std::random_device{}()}() % size(foo);
const auto result = next(cbegin(foo), idx);

The solution to 1 and 2 would be a bit more elaborate, as I'd recommend doing away with vectors all together in favor of multimap<size_t, double> foo This contains all the benefits of the map solution, but the trade off is that the keys must be iterated using upper_bound. Additionally, because multimap does not store a key count, size_t keyCount would need to be maintained alongside the multimap. Or presuming it was a temporary initialized to 0U it could be wastefully found at the time it is needed: for(auto it = cbegin(foo); it != cend(foo); it = foo.upper_bound(it->first)) ++keyCount; Using keyCount we can again find result which here will be an const-iterator to the first of the elements matching the random key:

int idx = keyCount == 0U ? 0 : std::mt19937{std::random_device{}()}() % keyCount;
auto result = cbegin(foo);

while(idx-- > 0) result = foo.upper_bound(result->first);

C++ Randomly select non-empty element of std::vector >

5 Answers5