How to ensure good, different initial NumPy MT19937 states?

Question

Introduction - legacy NumPy

The legacy NumPy code of initializing MT19937 instances (same as on Wikipedia) ensured that different seed values lead to different initial states (or at least if a single int is provided). Let's check the first 3 numbers in the prng's state:

np.random.seed(3)
np.random.get_state()[1][:3]
# gives array([         3, 1142332464, 3889748055], dtype=uint32)

np.random.seed(7)
np.random.get_state()[1][:3]
array([         7, 4097098180, 3572822661, 1142383841], dtype=uint32)
# gives array([         7, 4097098180, 3572822661], dtype=uint32)

However, this method is criticized for 2 reasons:

seed size is limited by the underlying type, uint32
similar seeds may result in similar random numbers

The former can be solved if one can provide a sequence of int (which is indeed implemented, but how?), but the latter is harder to address. The implementation of the legacy code has been written keeping this property in mind [^1].

Introduction - new NumPy random

In the new implementation, the seed value provided is hashed first, then used to feed the initial state of the MT19937. This hashing ensures that

the similarity of the seed values doesn't matter, 2 similar seed values produce different initial state with the same probability as non-similar seed values. Previously we have seen that, for adjacent seed values, the first state variable (out of 600+) is similar. Whereas in the new implementation, not a single similar value can be found (with high chance) except for the first one for some reason:
```
prng = np.random.Generator(np.random.MT19937(3))
prng.bit_generator.state["state"]["key"][:3]
# gives array([2147483648, 2902887791,  607385081], dtype=uint32)

prng = np.random.Generator(np.random.MT19937(7))
prng.bit_generator.state["state"]["key"][:3]
# gives array([2147483648, 3939563265, 4185785210], dtype=uint32)
```
Two different seed values (ints of any length) may result in a similar initial state with a probability of $2^{-128}$ (by default).

If the problem of similar seeds has been already solved by Matsumoto et el., [^1], then there was no need to use a hash function, which introduces the state collision problem.

Question

Given the new implementation in NumPy, is there a good practice that ensures the different initial states of the MT19937 instances and passes quality requirements when it comes to similar seed values? I am looking for an initialization method that consumes at least 64 bits.

How about modifying the generate_state output of the SeedSequence class: if two ints are given, replace the first 2 states (maybe except the first one) with the given seed values themselves:

class secure_SeedSequence(np.random.SeedSequence):
    def __init__(self, seed1: np.uint32, seed2: np.uint32):
        self.seed1 = seed1
        self.seed2 = seed2

    def generate_state(self, n_words, dtype):
        ss = np.random.SeedSequence([self.seed1, self.seed2])
        states = ss.generate_state(n_words, dtype)
        states[1] = self.seed1
        states[2] = self.seed2
        return states


ss_a = secure_SeedSequence(3, 1)
prng_a = np.random.Generator(np.random.MT19937(ss_a))
# gives [2147483648          3          1  354512857 3507208499 1943065218]

ss_b = secure_SeedSequence(3, 2)
prng_b = np.random.Generator(np.random.MT19937(ss_b))
# gives [2147483648          3          2 2744275888 1746192816 3474647608]

Here secure_SeedSequence consumes 2*32=64 bits, prng_a and prng_b are in different states, and except for the first 3 state variables, all the state variables are not alike. According to Wikipedia, the first 2 numbers may have some correlation with the 2 first state-variables, but after generating 624 random numbers, the next internal state won't reflect the initial seeds anymore. To avoid this problem, the code can be improved by skipping the first 2 random numbers.

Workaround

One can claim that the chances that two MT19937 instances will have the same state after providing different entropy for their SeedSequence is arbitrary low, by default, it is $2^{-128}$. But I am looking for a solution that ensures 100% probability that the initial states are different, not only with $1-2^{-32\cdot N}$ probability.

Moreover, my concern with this calculation is that although the chance of getting garbage streams are low, once we have them, they produce garbage output forever, therefore, if a stream of length $M$ is generated, and $N$ streams/prngs are used, then by selecting $M$ pieces of numbers from this $M \times N$ 2D array, the chances that a number is garbage, tends to 1.

Why I asked it here?

this is strongly related to a given implementation in NumPy
the chances that I get an answer is the highest here
I think this is a common issue and I hope others have investigated this topic deeply already

[^1]: Common Defects in Initialization of Pseudorandom Number Generators, MAKOTO MATSUMOTO et al., around equation 30.