21

I have two vectors:

vector1 = [1 2 3 4 5 6 7 8 9]

vector2 = [1 2 3 4 5 6 7 8 9]

I want to ensure, that when I shuffle both using random_shuffle they should be shuffled in the same corresponding order. For example:

Output after shuffling should be like:

vector1 = [1 9 3 4 2 7 8 5 6]

vector2 = [1 9 3 4 2 7 8 5 6]

But I am getting output like:

vector1 = [5 1 7 4 2 3 9 8 6]

vector2 = [3 4 1 9 8 2 5 7 6]

Heres my code:

int main () 
{
  std::srand ( unsigned ( std::time(0) ) );
  std::vector<int> vector1, vector2;

  // set some values:
  for (int i=1; i<10; ++i)
  {
    vector1.push_back(i);
    vector2.push_back(i);
  }

  // using built-in random generator:
  std::random_shuffle ( vector1.begin(), vector1.end() );
  std::random_shuffle ( vector2.begin(), vector2.end() );

  // print out content:
  std::cout << "vector1 contains:";
  for ( std::vector<int>::iterator it1 = vector1.begin(); it1 != vector1.end(); ++it1 )
    std::cout << ' ' << *it1;

  std::cout << '\n';
  std::cout << '\n';

  std::cout << "vector2 contains:";
  for ( std::vector<int>::iterator it2 = vector2.begin(); it2 != vector2.end(); ++it2 )
    std::cout << ' ' << *it2;

  std::cout << '\n';
  std::cout << '\n';

  return 0;
}

EDIT This is an example case that I tried to implement. In practise, I have one vector of images and one vector of corresponding labels. I need them to be shuffled in the same manner. Could anybody please help...... thanks a lot!!

learner
  • 1,197
  • 6
  • 22
  • 34

8 Answers8

31

Instead of shuffling the vectors themselves, shuffle a vector of indexes into the other vectors. Since you'll be using the same indexes for both, they're guaranteed to be in the same order.

std::vector<int> indexes;
indexes.reserve(vector1.size());
for (int i = 0; i < vector1.size(); ++i)
    indexes.push_back(i);
std::random_shuffle(indexes.begin(), indexes.end());

std::cout << "vector1 contains:";
for ( std::vector<int>::iterator it1 = indexes.begin(); it1 != indexes.end(); ++it1 )
    std::cout << ' ' << vector1[*it1];
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • 1
    +1. This is more robust and testable than trying to control the shuffles by resetting the seed. – Adrian McCarthy Jun 06 '13 at 17:26
  • 1
    And in C++ 11, you can use `std::iota(indexes.begin(), indexes.end(), 0);` to create the initial index vector. – cxyzs7 Jun 06 '13 at 18:44
  • 1
    It's a good idea but needs a good implementation of how to shuffle, at best inplace, both vectors. – Jan Herrmann Jun 07 '13 at 07:10
  • @Mark even though i'm a beginner...but i wana learn smart programming! too good!!! crisp & clear & concise! thanks!!! – learner Jun 07 '13 at 09:23
  • 1
    @JanHerrmann, once you have the index (reorder) vector you can use an answer to one of these questions: http://stackoverflow.com/questions/838384/reorder-vector-using-a-vector-of-indices http://stackoverflow.com/questions/7365814/in-place-array-reordering – Mark Ransom Jun 07 '13 at 14:53
  • @JanHerrmann or you could code the algorithm yourself without using `std::random_shuffle` and shuffle both vectors simultaneously using the same random numbers in the [Fisher-Yates shuffle algorithm](http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle) but that would be a different answer. – Mark Ransom Jun 07 '13 at 15:01
  • As I read answers about in place array reordering I see that they are either O(N^2) or are not inplace. – Jan Herrmann Jun 10 '13 at 08:18
  • This common sense solution but requires that the indexes buffer of size N is created. Other solutions such as using boost zip_iterator or custom swap function are most elegant solution when you have the following requirements: 1) Memory use is concern 2) You are using a true random_device which does not reproduce sequences of numbers – Crog Mar 12 '19 at 10:17
17

Make sure you use the same seed for both calls to random_shuffle():

auto seed = unsigned ( std::time(0) );

// ...

std::srand ( seed );
std::random_shuffle ( vector1.begin(), vector1.end() );

std::srand ( seed );
std::random_shuffle ( vector2.begin(), vector2.end() );

Notice, however, that the Standard does not specify that random_shuffle() should use the rand() function to generate a random permutation - this is implementation-defined. Therefore, srand() will not affect the result of random_shuffle() on implementations that do not use rand().

Paragraph 25.3.12/4 of the C++11 Standard on random_shuffle() specifies:

Remarks: To the extent that the implementation of these functions makes use of random numbers, the implementation shall use the following sources of randomness:

The underlying source of random numbers for the first form of the function is implementation-defined. An implementation may use the rand function from the standard C library. [...]

Therefore, if you want to make sure you are writing portable code, use the version of random_shuffle() that accepts a random number generator as a third argument, so that you have control over the seeding.

Andy Prowl
  • 124,023
  • 23
  • 387
  • 451
11

As others have shown, re-seeding with the same seed should allow you to replicate the same shuffle multiple times. However, if you can use C++11 I'd recommend implementing this without using srand() and random_shuffle(); instead you should use the <random> library with std::shuffle.

First, if possible rand should be avoided. Aside from the fact that it may not be a very good pRNG, it also has problems with thread safety due to shared state. The <random> library fixes both these problems by giving the programmer explicit control over pRNG state and by providing several options with guaranteed performance, size, and quality characteristics.

Secondly, random_shuffle isn't actually specified to use rand so it's theoretically legal for reseeding using srand not to have the effect you want. To get guaranteed results with random_shuffle you have to write your own generator. Moving to shuffle fixes that, as you can directly use standard engines.

#include <algorithm> // shuffle, copy
#include <iostream>  // cout
#include <iterator>  // begin, end, ostream_iterator
#include <numeric>   // iota
#include <random>    // default_random_engine, random_device
#include <vector>    // vector

int main() {
  std::vector<int> v1(10);
  std::iota(begin(v1), end(v1), 1);
  auto v2 = v1;

  std::random_device r;
  std::seed_seq seed{r(), r(), r(), r(), r(), r(), r(), r()};

  // create two random engines with the same state
  std::mt19937 eng1(seed);
  auto eng2 = eng1;

  std::shuffle(begin(v1), end(v1), eng1);
  std::shuffle(begin(v2), end(v2), eng2);

  std::copy(begin(v1), end(v1), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "\n\n";
  std::copy(begin(v2), end(v2), std::ostream_iterator<int>(std::cout, " "));
  std::cout << "\n\n";
}
bames53
  • 86,085
  • 15
  • 179
  • 244
  • 3
    This should have been the accepted answer. – Bull Dec 03 '17 at 02:12
  • considering that random_shuffle was deprecated in C++14 and completely removed in C++17 , now this solutions is the best choice as it uses std::shuffle . – Amir Rasti Aug 31 '23 at 19:29
4

You could create an random access iterator which if its dereferenced returns a std::tuple to references of elements of the corresponding vectors. So you could shuffle them inplace. Or you look at the boost version. So it should look something like this:

std::random_shuffle(
  boost::make_zip_iterator(
    boost::make_tuple(vector1.begin(), vector2.begin())
  ),
  boost::make_zip_iterator(
    boost::make_tuple(vector1.end(), vector2.end()
  ),

);

This shuffles your data inplace, works with more than two vectors and is self documenting if you know what make_zip_iterator does. Of course it should be faster than shuffle two times or use a third vector.

Jan Herrmann
  • 2,717
  • 17
  • 21
  • 1
    This is a good idea for readability (and ranges would make it even more readable) but it's not definite that it would be faster than alternatives. In particular I'm thinking that the different memory access patterns could have an unexpected effect on performance. – bames53 Jun 06 '13 at 19:28
  • @bames53 I think you have in any case an unexpected effect on performance. As I think the purpose of random shuffle negates all efforts to make it cache friendly. But it uses one less vector, which might be more cache friendly and it is O(n). If your are able to reorder your two vectors faster than O(n) The shouffling of a third vector may be faster. But now you have at least 2 simply implementable versions and could be able to messure. I would be intersted in the results. – Jan Herrmann Jun 07 '13 at 07:05
  • +1 As this is a library-only based implementation of the custom swap function suggested in comments and one proposed solution. Would work with a tru std::random_device without creating intermediate index tables. Memory access/performance may not be perfect but for random shuffle this is always an issue! – Crog Mar 12 '19 at 10:13
2

Seed the pseudo-random number generator with a reproducible value before each time you shuffle.

std::srand ( 42 );
std::random_shuffle ( vector1.begin(), vector1.end() );
std::srand ( 42 );
std::random_shuffle ( vector2.begin(), vector2.end() );
StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
2

If both have to have the same order, why are they separate vectors? The logical solution would be something like:

struct ImageData
{
    Image myImage;
    std::string myLabel;
    //  ...
};

You then have a single vector of ImageData which you shuffle.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
0

Unfortunately, if we use srand we change internal seed-value. I mean, the next random numbers will be predetermined. And, first decision:

std::srand ( 42 );
std::random_shuffle ( vector1.begin(), vector1.end() );
std::srand ( 42 );
std::random_shuffle ( vector2.begin(), vector2.end() );
std::srand ( unsigned ( std::time(0) ) );
// Post-code.

To save rand for post-code.

Second decision - it's Mark Ransom solution - it doesn't call std::srand at all (and, I just notice, it has a higher performance).

Boris
  • 597
  • 2
  • 10
-1

Why don't you write your own shuffle:

for( size_t i = 0 ; i < numitems; ++i )
{
    size_t next = random() % numitems ;
    swap( v1[i], v1[next] );
    swap( v2[i], v2[next] );
}
Diego Sánchez
  • 539
  • 2
  • 12
  • 1
    Perhaps because it is so easy to get wrong (like you did). – James Kanze Jun 06 '13 at 17:22
  • @JamesKanze Could you please elaborate? – Diego Sánchez Jun 06 '13 at 17:24
  • 2
    The algorithm you post doesn't give a truly random shuffle. In particular, it has `pow( maxRandom, numitems )` different possible results, but there are `fact( numitems )` different possible orderings. Since `pow( maxxRandom, numitems )` is not a multiple of `fact( numitems )`, the chance of each ordering cannot be equal. – James Kanze Jun 06 '13 at 17:37
  • 3
    Also it's just generally bad practice to re-implement existing functionality. (Btw, your code could be fixed by doing two things, first generate `next` in the range (i, numitems) instead of [0,numitems); Secondly (and less importantly) using a proper uniform distribution rather than `random() % n`.) – bames53 Jun 06 '13 at 18:38
  • +1 For 'conceptual' solution that would work with std::random_device which may not be deterministic and cannot be expected to repeat the same pattern twice. Just needs implementing properly as mentioned by previous commenters! – Crog Mar 12 '19 at 10:08