4

I was experimenting with a toy sample program:

map<int, char> foo { { 1, 'a' }, { 2, 'b' }, { 3, 'c' } };
vector<pair<decltype(foo)::key_type, decltype(foo)::mapped_type>> bar(size(foo));

sample(begin(foo), end(foo), begin(bar), size(foo), mt19937{ random_device{}() });

Live Example

But bar always contains the contents of foo in order. Is this a gcc implementation problem, or am I just repeatedly getting unlucky?

Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288

2 Answers2

5

std::sample selects elements from the range you pass. From cppreference (emphasize mine):

Selects n elements from the sequence [first; last) such that each possible sample has equal probability of appearance, and writes those selected elements into the output iterator out. Random numbers are generated using the random number generator g.

If n is greater than the number of elements in the sequence, selects last-first elements.

I think the docs could be more clear, but returning only last-first in case the number of requested elements is greater, only makes sense if each element is selected at maximum once.

Try:

map<int, char> foo { { 1, 'a' }, { 2, 'b' }, { 3, 'c' } };
vector<pair<decltype(foo)::key_type, decltype(foo)::mapped_type>> bar(size(foo)-1);

sample(begin(foo), end(foo), begin(bar), bar.size(), mt19937{ random_device{}() });

to get two random samples out of foo.

Also note that

The algorithm is stable only if PopulationIterator meets the requirements of ForwardIterator

ie it was not just out of luck that you always got the same result.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • So first, `size(foo)` is *not* "greater than the number of elements in the sequence." It's equal to, so that condition shouldn't apply. And second, a `map` returns a bidirectional iterator, which exceeds the requirements of a forward iterator... so I'm not certain why you cite that section. All told this *still* sounds like a bug, however Visual Studio 2017 *and* Clang both give concurrent results. Do you think the http://en.cppreference.com specification of "greater than the number of elements" is in error? – Jonathan Mee Sep 10 '18 at 12:34
  • @JonathanMee correct, your `size(foo)` is not greater than the number of elements. I just put this argument, because only from "selects n elements" it isnt obvious that each element is selected only once. However, given that in case you ask for more elements than what are available you simply get all the elements, the only reasonable conclusion is that no element is selected more than once. Concerning bidirectional iterators exceeding the reuqirement of a forward iterator, this is not in contradiction to what I wrote. I just quoted this part, it explains why you always got the same result – 463035818_is_not_an_ai Sep 10 '18 at 12:44
  • @JonathanMee it is not "select" in the sense of "draw n elements with repetitions allowed" but rather "draw n element with no repetiotions allowed and order does not matter" – 463035818_is_not_an_ai Sep 10 '18 at 12:47
  • @JonathanMee in other words: it is not "draw n-times out of all elements" but rather "do n-times: draw one out of the remaining elements" – 463035818_is_not_an_ai Sep 10 '18 at 12:48
  • My understanding of your comments is that the statement: "If n is greater than the number of elements in the sequence, selects last-first elements" has no baring on the fact I always get the sequence in order, but perhaps the use of the bidirectional iterator does? – Jonathan Mee Sep 10 '18 at 12:49
  • @JonathanMee yep, you always get guaranteed same order because you use a bidirectional iterator. – 463035818_is_not_an_ai Sep 10 '18 at 12:50
  • @JonathanMee ...cppref does not mention greater or equal (only greater) because once you agreed how "selecting" works, selecting n out of n will get you `[first,last)` anyhow, no special note needed to clarify that – 463035818_is_not_an_ai Sep 10 '18 at 12:51
2

Sampling is about returning some subset of a larger population.

It is not intended to return elements in a random order, or any other order. It could, but that's not really what it's there for.

cppreference hints at ordering in this statement:

The algorithm is stable only if PopulationIterator meets the requirements of ForwardIterator

"Stable" here means it would return results in the same order as the input, thus the order is guaranteed to not be random with a ForwardIterator. Related: What is stability in sorting algorithms and why is it important?

This also makes sense, since, similar to what's been noted in the comments, to be efficient, you'll first need to figure out which elements to pick, and then go through the iterator and pick the elements, since you can only iterate from one direction to the other. Thus it should be trivial to keep the elements in the same order.

As for when you're not using a ForwardIterator, it makes no guarantee about the order one way or the other. So, even if it might appear to be randomly ordered, it would not be wise to rely on this, as the randomness of the ordering may be implementation-dependent and it may or may not have high entropy.

If you want a random order, you should shuffle it.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
  • I was just typing up an answer to this effect. The thing that made it click for me is that [`sample`](https://en.cppreference.com/w/cpp/algorithm/sample) takes input iterators that are *forward iterators* where stuff like [`shuffle`](https://en.cppreference.com/w/cpp/algorithm/random_shuffle) takes *random access iterators*. The key obviously if `sample` is going to use a forward iterator efficiently it has to know what elements will be selected before iterating. – Jonathan Mee Sep 10 '18 at 14:11
  • @JonathanMee Edited. – Bernhard Barker Sep 10 '18 at 14:30