1

I am given a large vector that itself contains vectors of a specific data type e.g. std::vector<std::vector<double> > foo. I am trying to retrieve a random element foo[idx] from foo such that foo[idx] is non-empty or respectively foo[idx].empty() == false.

My naive guess would be to select random elements from foo until my constraint foo[idx].empty() == false is fulfilled.

However it is very likely the case that foo is very sparsely filled with non-empty vectors. Hence, my approach would quite likely be devastatingly slow.

Are there better approaches or should i think of a completely different data structure?

Flo Ryan
  • 385
  • 1
  • 15

5 Answers5

6

Maintain an auxiliary vector with the index of the non empty elements and get a random element from there

Gerard
  • 196
  • 6
  • This idea is not that efficient because you are storing 2x memory and it takes time to constantly make sure they are the same – Jake Freeman Dec 20 '17 at 12:21
  • It takes constant time to maintain using a hash table. About the memory, yes, it takes memory. but memory is free :D – Gerard Dec 20 '17 at 12:23
2

You could build an index of the nonempty elements:

std::vector<std::vector<double> > foo;
std::vector<decltype (foo)::iterator> nonempty;
for (auto it = foo.begin(); it != foo.end; ++it)
{
  if (! it->empty())
  {
    nonempty.push_back(it);
  }
}
std::random_device rd;
// random-number engine used (Mersenne-Twister in this case) 
std::mt19937 rng(rd());
// create a guaranteed unbiased index (unlike using modulo on rand)
std::uniform_int_distribution<size_t> uni_idx_dist(0,nonempty.size() - 1); 

auto &random_nonempty = *nonempty[uni_idx_dist(rng)];
Lanting
  • 3,060
  • 12
  • 28
0

You can first extract an index of what's non-empty, then select one:

std::vector<int> ind;
for (int i = 0; i < foo.size(); i++){
    if (! foo[i].empty()) {
        ind.push_back(i);
    }
}
int i = rand() % int.size();
return int[i];
iBug
  • 35,554
  • 7
  • 89
  • 134
0

You could build a vector of references to non empty vectors.

#include <algorithm>
#include <functional>
#include <iterator>
#include <random>
#include <vector>
#include <iostream>

int main() {
    using int_vec_t = std::vector<int>;
    std::vector<int_vec_t> v = {
        {0, 1, 2}, {}, {}, {3, 4, 5},
        {}, {6, 7, 8}, {}, {}, {9}, {10, 11}
    };

    // You can't put reference direcly, so use reference_wrapper instead
    std::vector<std::reference_wrapper<int_vec_t> > nonempty;
    nonempty.reserve(v.size());
    // "copy" non empty vectors. (Doesn't do copy, actually)
    std::copy_if(v.begin(), v.end(), std::back_inserter(nonempty), [](const int_vec_t& v) { return !v.empty();});
    if (nonempty.empty())
        return 0;
    // pick an element
    static std::random_device rd;
    static std::mt19937 gen(rd());
    std::uniform_int_distribution<> dis(0, nonempty.size() - 1);
    const int_vec_t& result = nonempty[dis(gen)];
    // dump result
    std::copy(result.begin(), result.end(), std::ostream_iterator<int>(std::cout, ", "));

    return 0;
}
ivaigult
  • 6,198
  • 5
  • 38
  • 66
0

There are a couple problems here:

  1. The ordinal vector is sparsely populated
  2. A vector of vectors is wateful

To solve problem 1 I'd recomend the use of a map<size_t, vector<double>> foo this will allow you to use non-linear indices, but it won't require the population of the intervening empty vectors. Selection of a random populated element here just involves advancing the iterator to point to the appropriate element. For example result will be the const-iterator to a random key-value pair in foo:

const auto idx = foo.empty() ? 0U : std::mt19937{std::random_device{}()}() % size(foo);
const auto result = next(cbegin(foo), idx);

The solution to 1 and 2 would be a bit more elaborate, as I'd recommend doing away with vectors all together in favor of multimap<size_t, double> foo This contains all the benefits of the map solution, but the trade off is that the keys must be iterated using upper_bound. Additionally, because multimap does not store a key count, size_t keyCount would need to be maintained alongside the multimap. Or presuming it was a temporary initialized to 0U it could be wastefully found at the time it is needed: for(auto it = cbegin(foo); it != cend(foo); it = foo.upper_bound(it->first)) ++keyCount; Using keyCount we can again find result which here will be an const-iterator to the first of the elements matching the random key:

int idx = keyCount == 0U ? 0 : std::mt19937{std::random_device{}()}() % keyCount;
auto result = cbegin(foo);

while(idx-- > 0) result = foo.upper_bound(result->first);
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288