1

Lets assume I have an array containing 2 million ids. I now want to retrieve a sample of these ids. At the moment I use a random sampling as proposed in this questions answer here.

private static void shuffleScoreArray(ScoreDoc[] ar) {
    Random rnd = new Random();
    for (int i = ar.length - 1; i > 0; i--) {
        int index = rnd.nextInt(i + 1);
        // Simple swap
        ScoreDoc a = ar[index];
        ar[index] = ar[i];
        ar[i] = a;
    }
}

This works great and all, but how can I now retrieve a non random (and more or less good distributed - doesn't have to be 100% equally) sampling? Non random in this case means if I call the function with the same input array twice I will both times get the same result sample.

I just did a lot of research on SO and Google but couldn't find an approach helping me in this case. Most approaches on SO seem to deal with random sampling approaches or with increasing performance steps.

What I could imagine (but don't know if working) is that you always use the same Random object, but I'm unsure on how to put this into working as intended java code.

Thanks a lot for every thought and answer you're sharing with me.

Community
  • 1
  • 1
Waylander
  • 825
  • 2
  • 12
  • 34

2 Answers2

2

Pass a seed to the RNG. Instead of this:

Random rnd = new Random();

Use this:

Random rnd = new Random(12345l);

The values that come out of the RNG will be the same for the same seed value.

Patrick Collins
  • 10,306
  • 5
  • 30
  • 69
0

Since you want to receive the same result over and over again if you input the same array; why don't you take the every nth record from the array? You can easily calculate n by dividing the array by the sample size you want.

But the above method will not guarantee a proper distribution unless you sort the array first.

Sampath
  • 1,144
  • 1
  • 21
  • 38