5

I need your help to solve the following problem:

Is there a function in c++/opencv which is equivalent to the following code:

np.random.choice(len(vec), samples, p=probabilities[:,0], replace=True)

Thanks in advance.

jack
  • 195
  • 1
  • 2
  • 11
  • it looks like you are using the function incorrectly. It should probably be [`numpy.random.choice(samples, size=len(vec), replace=True, p=probabilities[:,0])`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html) – JHBonarius Mar 21 '17 at 12:34

3 Answers3

10

Well, lets look at: numpy.random.choice(a, size=None, replace=True, p=None) (see my comment, I guess you mixed up some of the function's parameters.)

For the input a you are using an array of samples. As an output size you want len(vec), you want sampling with replacement and have a custom non-uniform distribution.

It is probably sufficient to first generate an array of indices using a random distribution and then using the array of indices to generate an array of selected elements.

C++ offers help in generating non-uniform distributed numbers, being the std::discrete_distribution

Example:

#include <random>
#include <vector>
#include <algorithm>
#include <iostream>

int main()
{
    auto const samples = { 1, 2, 3, 4, 5, 6 }; // deducts to std::initializer_list<int>
    auto const probabilities = { 0.1, 0.2, 0.1, 0.5, 0.0, 1.0 }; // deducts to std::initializer_list<double>
    if (samples.size() < probabilities.size()) {
        std::cerr << "If there are more probabilities then samples, you will get out-of-bounds indices = UB!\n";
        return -1;
    }

    // generate non-uniform distribution (default result_type is int)
    std::discrete_distribution const distribution{probabilities};
    // note, for std::vector or std::array of probabilities, use
    // std::discrete_distribution distribution(cbegin(probabilities), cend(probabilities));

    int const outputSize = 10;

    std::vector<decltype(distribution)::result_type> indices;
    indices.reserve(outputSize); // reserve to prevent reallocation
    // use a generator lambda to draw random indices based on distribution
    std::generate_n(back_inserter(indices), outputSize,
        [distribution = std::move(distribution), // could also capture by reference (&) or construct in the capture list
         generator = std::default_random_engine{}  //pseudo random. Fixed seed! Always same output.
        ]() mutable { // mutable required for generator
            return distribution(generator);
        });

    std::cout << "Indices: ";
    for(auto const index : indices) std::cout << index << " ";
    std::cout << '\n';

    // just a trick to get the underlying type of samples. Works for std::initializer list, std::vector and std::array
    std::vector<decltype(samples)::value_type> output;
    output.reserve(outputSize); // reserve to prevent reallocation
    std::transform(cbegin(indices), cend(indices),
        back_inserter(output),
        [&samples](auto const index) {
            return *std::next(cbegin(samples), index);
            // note, for std::vector or std::array of samples, you can use
            // return samples[index];
        });

    std::cout << "Output samples: ";
    for(auto const sample : output) std::cout << sample << " ";
    std::cout << '\n';
}

On godbolt.org

edit: link seems to suggest that std::default_random_engine performs sampling with replacement.

JHBonarius
  • 10,824
  • 3
  • 22
  • 41
4

seems you are looking to sample from a discrete random distribution

the example on that page is fairly demonstrative:

// discrete_distribution
#include <iostream>
#include <random>

int main()
{
  const int nrolls = 10000; // number of experiments
  const int nstars = 100;   // maximum number of stars to distribute

  std::default_random_engine generator;
  std::discrete_distribution<int> distribution {2,2,1,1,2,2,1,1,2,2};

  int p[10]={};

  for (int i=0; i<nrolls; ++i) {
    int number = distribution(generator);
    ++p[number];
  }

  std::cout << "a discrete_distribution:" << std::endl;
  for (int i=0; i<10; ++i)
    std::cout << i << ": " << std::string(p[i]*nstars/nrolls,'*') << std::endl;

  return 0;
}
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
Joseph Ireland
  • 2,465
  • 13
  • 21
  • 1
    ok, I took too long to type my answer, maybe I should also just have copied the example code from the website, instead of trying to write code that represents his question ;P – JHBonarius Mar 21 '17 at 13:11
  • I did see the solution but i have probabilities instead!! would it work even then? – jack Mar 21 '17 at 13:14
2

I don't think there is a function that gives you this for free. You may have to write it yourself.

Some hints as to how to write such a function:

  • Let us say you have a vector<float> storing your probabilities. First use std::partial_sum on this vector to get the accumulated probabilities of the elements.
  • Then, for each sample, generate a random floating-point number between 0 and 1. Let us call it random_value. Iterate over your vector of accumulated probabilities until you find a value bigger than random_value. The index at this point is your sample index. Take the value at this index in your samples vector, store it somewhere and repeat.
Sunreef
  • 4,452
  • 21
  • 33
  • why should he accumulate the probabilities of his elements? – JHBonarius Mar 21 '17 at 13:12
  • @J.H.Bonarius Because this way, he can sample a number between 0 and 1 and get the right element by comparing with partial sums. It wouldn't work to compare with each element individual probability. Of course, if you are using the `std::discrete_distribution`, you don't need to worry about that. – Sunreef Mar 21 '17 at 13:15