The use of random_state is explained pretty well in the post I commented.
As for this specific case of TSNE, random_state is used to seed the cost_function of the algorithm.
As documented:
method : string (default: ‘barnes_hut’)
By default the gradient calculation algorithm uses Barnes-Hut
approximation running in O(NlogN) time
Also, search the term "random" in the paper you cited. The first line is
The gradient descent is initialized by sampling map points randomly
from an isotropic Gaussian with small variance that is centered around
the origin.
Also other locations of word "random" clarifies that there is randomness is choosing the starting landmark points, and hence can affect the local minima of function.
This randomness is represented by a pseudorandom number generator, which is seeded by random_state
parameter.
Explanation:
Some algorithms use the random numbers in initialization of certain parameters, such as weights for optimizing, splitting of data randomly into train and test, choosing some features etc.
Now in programming and software in general, nothing is inherently truly random. To generate random numbers, a program is used. But since its a program with some fixed steps, it cannot be truly random. So its called pseudorandom generators. Now to output different sequence of numbers each time, they take an input according to which numbers are generated. Typically, this input is the current time in milliseconds (Epochs UTC). This input is called seed. Fixing the seed means to fix the output numbers.
random_state
is used as seed for pseudorandom number generator in scikit-learn to duplicate the behavior when such randomness is involved in algorithms. When a fixed random_state, it will produce exact same results in different runs of the program. So its easier to debug and identify problems, if any.
Without setting the random_state
, different seeds will be used each time that algorithm is run and you will get different results. It may happen that you may get very high scores first time and can never be able to achieve that again.
Now in machine learning, we want to replicate our steps exactly same as performed before, to analyse the results. Hence random_state
is fixed to some integer. Hope it helps.