0

Does setting a specific random seed (random_state) when splitting train/test datasets using scikit-learn produce the same initialization of the random number generator (i.e., produces same pseudo-random numbers) over different platforms - for instance, over different cloud computing instances?

Thanks!

Nicola Fanelli
  • 502
  • 5
  • 11
gisk
  • 69
  • 6

1 Answers1

0

As long as random_state is equal on all platforms and they are all running the same versions of numpy, you should get the exact same splits.

Since random_state is a numpy instance, I think all of scikit-learn's pseudo-random number generators are frozen because numpy froze RandomState.

You can check the documentation for random_state here, which as you can see is numpy.random.RandomState. You can check numpy's compatibility guarantee here.

Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76