Platform-independent random state in scikit-learn train_test_split

Question

Does setting a specific random seed (random_state) when splitting train/test datasets using scikit-learn produce the same initialization of the random number generator (i.e., produces same pseudo-random numbers) over different platforms - for instance, over different cloud computing instances?

Thanks!

afaik it uses numpy, so https://stackoverflow.com/questions/40676205/cross-platform-numpy-random-seed — Shihab Shahriar Khan, Apr 07 '21 at 12:34

score 0 · Accepted Answer · answered Apr 07 '21 at 12:40

As long as random_state is equal on all platforms and they are all running the same versions of numpy, you should get the exact same splits.

Since random_state is a numpy instance, I think all of scikit-learn's pseudo-random number generators are frozen because numpy froze RandomState.

You can check the documentation for random_state here, which as you can see is numpy.random.RandomState. You can check numpy's compatibility guarantee here.

Platform-independent random state in scikit-learn train_test_split

1 Answers1