26

I have a vector containing n elements. I need to choose a subset of m elements randomly from the vector without repetition. What is the most efficient way of doing this? I need to do this several thousands of times in my code.

The solution on top of my mind is to use rand() to generate a random number k between 0 and n. Then pick the kth element in the vector and insert it into a std::set. Keep doing this till the set's size becomes equal to m. I am now assured that the set contains m unique elements randomly chosen from the set of n elements.

What are the other possible solutions?

Thanks.

Vinay
  • 373
  • 1
  • 6
  • 12
  • 6
    Do `std::random_shuffle()` on vector and pull first `m` elements out of it, perhaps? – jrok Feb 18 '12 at 23:25
  • 2
    @jrok: while simple, that is _highly inefficient when `m` is much smaller than `n`. – Mooing Duck Feb 19 '12 at 00:52
  • possible duplicate of [Algorithm to select a single, random combination of values?](http://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values) – Jerry Coffin Feb 19 '12 at 03:44
  • @MooingDuck But the fact is that `std::random_shuffle()` is just a full permutation which also uses *Fisher-Yates shuffle*. See the [first version of possible implementation](https://en.cppreference.com/w/cpp/algorithm/random_shuffle) from cppreference. For those who wants to understand the accepted answer and also `std::random_shuffle()`, it's worth for a read. – Rick Oct 25 '18 at 07:46
  • 1
    @Rick: Sure, it's the same algorithm, but instead of stopping after `m` elements, it does the entire data set, which is a massive waste of time. Its the difference between `std::sort` and [`std::partial_sort`](https://en.cppreference.com/w/cpp/algorithm/partial_sort). – Mooing Duck Oct 25 '18 at 16:32

3 Answers3

43

You want a Fisher-Yates shuffle (stop after M iterations):

template<class BidiIter >
BidiIter random_unique(BidiIter begin, BidiIter end, size_t num_random) {
    size_t left = std::distance(begin, end);
    while (num_random--) {
        BidiIter r = begin;
        std::advance(r, rand()%left);
        std::swap(*begin, *r);
        ++begin;
        --left;
    }
    return begin;
}

Demo at http://ideone.com/3A3cv. This is significantly faster than std::random_shuffle when you only need a few random numbers out of the set, and should be just about the same speed even if N==M.

akim
  • 8,255
  • 3
  • 44
  • 60
Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • @Burr Thanks! I have a million elements in my vector out of which I need to pick just 100 elements in random. This is exactly what I was looking for. – Vinay Feb 20 '12 at 17:09
  • 3
    rand(): see http://codereview.stackexchange.com/questions/39001/fisher-yates-modern-shuffle-algorithm – dani Feb 20 '16 at 22:40
  • Ya.. This is a **Stop after x selection** version of `std::random_shuffle`.. I am curious why STL not overloading this version. Thanks btw :D. – Rick Oct 25 '18 at 07:49
4

One way you could do this is to create a list of all the indices of the vector, shuffle them, and take the first n to be the indices of the selected objects:

struct rangegenerator {
    rangegenerator(int init) : start(init) { }

    int operator()() {
        return start++;
    }

    int start;
};

vector<T> numbers; // this is filled somewhere else

vector<int> indices(numbers.size());

generate(begin(indices), end(indices), rangegenerator(0));

random_shuffle(begin(indices), end(indices));

// then take the first n elements of indices and use them as indices into numbers
Seth Carnegie
  • 73,875
  • 22
  • 181
  • 249
  • 5
    When `m` is much smaller than `n`, this is highly inefficient. It's not hard to come up with an answer that is faster than this for all `m` (where `m` is less than `n`) – Mooing Duck Feb 19 '12 at 00:53
  • @Seth: Will have to agree with Moo. This is probably one of the worst ways to accomplish the given task - not sure why the OP marked it as an answer. The correct answer is obviously Burr's answer. – Jared Krumsie Feb 19 '12 at 01:55
  • 2
    @JaredKrumsie the OP asked for "other possible solutions" and what I wrote is definitely a possible solution. The only way an answer would be incorrect is if it didn't work at all. – Seth Carnegie Feb 19 '12 at 02:03
1

Since C++20,
We can use std::ranges::sample():

#include <cassert>
#include <iostream>
#include <random>
#include <vector>


void Test() {
  std::mt19937_64 random_engine{std::random_device{}()};
  std::vector<int> in{{1, 2, 3, 4, 5}};
  std::size_t m{2};

  std::vector<int> out{};
  std::ranges::sample(in, std::back_inserter(out), m, random_engine);
  assert(out.size() == m);

  for (const int &elem : out) {
    std::cout << elem << std::endl;
  }
}
ALittleDiff
  • 1,191
  • 1
  • 7
  • 24