Random number quality with given seed in haskell

Question

I'm playing with procedural genaration in a small ascii game and I encountered this issue with random numbers in haskell. The basic idea is to provide a random number seeded with (x,y) of some part of game world to for example decide if there's a tree there or not (this guy explains it nicely)

This is what I get when trying a different seed for each generation:

randomFromSeed :: Int -> Int -> Int
randomFromSeed max seed = fst (randomR (0, max - 1) (mkStdGen seed))

Prelude> map (randomFromSeed 10) [1..20]
[5,9,3,7,1,5,9,3,7,1,5,9,3,7,1,5,9,3,7,1]

It clearly has a period of 5, but on the other hand on the mkStdGen docs it says:

The function mkStdGen provides an alternative way of producing an initial generator, by mapping an Int into a generator. Again, distinct arguments should be likely to produce distinct generators.

So how come, there seem to be only 5 distinct generators coming?

How can I get those to be truly random when given different seeds?

Edit For some weird reason using bigger numbers makes it better:

Prelude> let mult = 1000000 in map (randomFromSeed 10) [0,mult .. 20*mult]
[3,7,0,6,9,2,8,1,4,0,3,9,2,5,1,4,7,3,6,9,5]

Why exactly do you need to use a separate seed per random number? You don't need to make a new `StdGen` everytime. `randomR` already provides you a new `StdGen` at the `snd` of it's return value and the common practice is to use that one to get the next random value.. — Redu, May 23 '20 at 09:42
I want to have a mapping that gives me some randomized value for each point (x,y) on the game map so I can move around an infinite world but be able to go back to places that I already visited. I always have the visible points at hand so I can recreate the same random map this way without remembering anything. — kajman, May 23 '20 at 09:48
So i guess you want to generate a persistent random number per coordinate so you think you need coordinate many seeds right? In Haskell this is not a proper approach as it would be in some other languages. What you can do is to generate coordinates many persistent random numbers in a list all at once from a single seed. Only then you may map your coordinates to those randoms. If you are interested i may come up with an answer. — Redu, May 23 '20 at 09:55
The mathematicians who design pseudo-random number generators make absolutely no claims about taking the first random number of every sequence when using a linear sequence of seeds. Typically, the user starts from a single seed passed from the command line, and chains the state of the random number generator thru the computations, either manually or thru monadic techniques. It is similar to using a state monad. See answers to this [question](https://stackoverflow.com/questions/57836652/how-can-i-generate-different-random-values-in-haskell) for example. — jpmarinier, May 23 '20 at 10:05
@Redu But how this would scale? Say I generate numbers for the whole (x, 0) line. I have to start the generation at some point, right? If I start generating for points [(0,0), (1,0),...] how can I reliably get something for (-1, 0) this way? Would I need two separate lists for each y? — kajman, May 23 '20 at 10:12
Do you have a finite coordinate system or is it an infinite universe? For a finite one such as say 1024 x 1024 you can make a `randomsList` of length 1024.1024 = 1048576 and then make a `chunksOf 1024 randomsList` to get a 2D list and you may perhaps convert it into a [Grid Structure](http://hackage.haskell.org/package/grids-0.5.0.1/docs/Data-Grid.html). If it is infinite then laziness of Haskell will be your friend. In this case you may check [this nice article](http://blog.sigfpe.com/2006/12/evaluating-cellular-automata-is.html). In any case you will probably end up using the State monad. — Redu, May 23 '20 at 10:38
@Redu Other problem with this solution that I see is that if I'm already far out, lets say at (1000,1000) and want to map all the points in close distance I would have to go through the sequence somewhere up to 1000th element. Am I missing something? So the further from (0,0) I go the slower the generation would become (assuming we do not store anything apart from current coords). — kajman, May 23 '20 at 10:39
Say a while ago, your player has visited point (7,9) and found an oak tree there. If, later after some random walk, the player returns to (7,9), the decision about which sort of tree to see has to be exactly the same as for the first visit ? And this preferably without having to memorize previous decisions inside some data structure ? — jpmarinier, May 23 '20 at 12:09
@jpmarinier Yes, that's precisely the point. The guy from the first link I posted explains it very nicely (unfortunately its about 40 minutes to watch). He does this in C# and has similar issues that I have at the beginning and then changes to some other RNG and everything is fine later. I'm trying to replicate this in Haskell to learn the language a bit and explore this approach. — kajman, May 23 '20 at 12:44
So as i understand you don't even need a data structure here since you will be holding no data at all but recreating it everytime you visit that particular coordinate. Then just a pair of `Int` to define the coordinate should be sufficient. Perhaps you can make a pairing function like `seed x y = (89^x)*(97^y)` to get a unique seed per coordinate (make sure that the bases are two distinct primes where powers are the x & y from coord). Then perhaps `randomR (1,10) . mkStdGen $ seed 7 9`. — Redu, May 23 '20 at 12:45
@Redu this works much better, thanks. I had some problems coming up with a nice function from (x,y) -> Int also so you fixed both problems at once. The seeds now differ enough and it looks nice. In case you guys wanted to see the whole picture: https://github.com/kajmaj87/lost_in_the_woods . I tagged the code from question as stack_question for future reference. — kajman, May 23 '20 at 13:22
To convert a (x,y) point to a single number, there is also the well-known and quite simple [Cantor pairing function](https://en.wikipedia.org/wiki/Pairing_function) from set theory. — jpmarinier, May 23 '20 at 14:42
@jpmarinier Thanks, this looks very interesting, but is there an easy way to extend it to the whole plane, not just NxN ?I can could easily do N- x N- -> N- (negative numbers) but what about N- x N or N x N- ? — kajman, May 23 '20 at 16:26
Ok, I think I know how to do it. I could first map integer to natural number (negative number to -2n and every positive number to 2n-1) and then apply Cantor pairing function to them. — kajman, May 23 '20 at 18:13
You can avoid the unexpected behavior you see by using pcg-random instead of random. Demo: https://github.com/dc25/stackoverflow-random-number-quality-with-given-seed-in-haskell/blob/master/app/Main.hs — Dave Compton, May 23 '20 at 23:41
@DaveCompton Thanks, this looks very promising and solves my problem exactly. — kajman, May 24 '20 at 05:31
[deja vu](https://www.reddit.com/r/elm/comments/44thxz/changing_argument_to_randominitialseed_does_not/) — Dave Compton, May 24 '20 at 12:33
@kajman - yes, you can use an odd/even test to extend the Cantor pairing function to all 4 quadrants. See an alternative way in my answer. BTW it would be nice if you could find the time to copy some information from your comments into the text of your question, and possibly tweak the question title accordingly. This would make it easier for users with similar problems to find your question. Thanks ! — jpmarinier, May 25 '20 at 13:58

score 3 · Answer 1 · answered May 25 '20 at 13:53

So how come, there seems to be only 5 distinct generators coming ?

It is an illusion to think there are just 5 generators. If you print the second numbers of each sequence instead of the first ones, you get this:

random2ndFromSeed :: Int -> Int -> Int
random2ndFromSeed max seed =
    let   g0       = mkStdGen seed
          (v1, g1) = randomR (0, max - 1) g0
          (v2, g2) = randomR (0, max - 1) g1
    in   v2

 λ> 
 λ> map  (random2ndFromSeed 10)  [1..40]
[6,9,3,8,1,4,8,3,6,9,3,8,1,4,8,3,6,9,3,8,1,4,8,3,6,9,3,8,1,4,8,3,6,9,3,8,1,4,8,3]
 λ>

So the periodicity appears then to be 8 instead of 5 !

A way to get rid of the apparent problem is to replace the standard generator by the Threefish one, which is of more recent design and has better statistical properties. Or you can also use pcg-random as mentioned by Dave Compton.

import  System.Random.TF

tfRandomFromSeed :: Int -> Int -> Int
tfRandomFromSeed max seed = let   g0 = mkTFGen seed
                            in    fst $ randomR (0, max - 1) g0

 λ> 
 λ> map  (tfRandomFromSeed 10)  [1..40]
[4,5,6,7,5,3,3,0,0,4,2,8,0,4,1,0,0,1,3,5,6,4,3,6,4,0,3,6,4,0,2,4,5,9,7,3,8,5,2,4]
 λ>

More generally, the appearance of randomness is supposed to come from repeated applications of the generator next function. Here, the function is only applied once per seed/sequence, so there is no claim to randomness.

How to create a persistent 2D random field

From the comments, the actual need is of a “random” function of a point in 2D space. If the player, after some random walk, returns to some already visited point, it is expected to find the same random value as before, and this without memorizing previous random values.

And to achieve this in a way that gets us some guarantees about the statistical properties of the random values, we need to do it with a single seed and a single random sequence; for thats' what our applied mathematicians are testing.

We need two things to produce such a persistent two-dimensional random field:

a way to map a 2D point to a single natural number, used as an offset into a random sequence
a random number generator with efficient access to arbitrary points of each sequence

Mapping 2D points to natural numbers

This can be done for example by leveraging the Cantor Pairing Function from elementary set theory.

We can use this code:

-- limited to first quadrant, x >= 0 and y >= 0:
cantor1 :: Int -> Int -> Int
cantor1 x y = y + (let s = x + y  in  div  (s * (s+1))  2)

-- for all 4 quadrants:
cantor :: (Int, Int) -> Int
cantor (x,y) =
    let quadrant
          | x >= 0  &&  y >= 0   =  0
          | x <  0  &&  y >= 0   =  1
          | x <  0  &&  y  < 0   =  2  
          | x >= 0  &&  y <  0   =  3
          | otherwise            =  error  "cantor: internal error #1"
        cant1
          | x >= 0  &&  y >= 0   =  cantor1     x      y
          | x <  0  &&  y >= 0   =  cantor1  (-1-x)    y
          | x <  0  &&  y  < 0   =  cantor1  (-1-x)  (-1-y)
          | x >= 0  &&  y <  0   =  cantor1     x    (-1-y)
          | otherwise            =  error  "cantor: internal error #2"
    in
         4*cant1 + quadrant

Arranging arbitrary access

With this preliminary step out of the way, we have to recognize that the regular Haskell random number generation API is not well suited to the task at hand.

The API provides sequential access to the random sequence thru the next function. But there is no arbitrary access, such as provided in the C++ random library by the discard function. And the classic monadic style using the MonadRandom interface is all about sequential access. It's basically like a state monad.

Furthermore, with some random number generators, efficient access to an arbitrary point of the sequence is simply impossible. In such a case, the C++ discard function just uses costly single stepping to get to the wanted point.

Fortunately, there is a Haskell implementation of Pierre L'Ecuyer et al MRG32k3a random number generator.

With MRG32k3a, arbitrary access into the random sequence boils down to exponentiation of small matrices in 2 Galois fields. Thanks to the ancient and revered Indian exponentiation algorithm, this can be done in O(log n) time.

The MRG32k3a code in github does not provide a full Haskell style interface, such as a RandomGen instance, so we have to add a bit of wrapper code around it.

First, we need some import clauses:

import  System.Random
import  System.Random.TF
import qualified  Data.List           as  L
import qualified  Text.Printf         as  TP
import qualified  Data.Text           as  TL
import qualified  Data.ByteString     as  BS
import qualified  Data.Text.Encoding  as  TSE
import qualified  Crypto.Hash.SHA256  as  SHA
import qualified  System.Random.MRG32K3A.Simple as MRG

and then the wrapper code itself:

newtype MRGen = MRGen MRG.State  -- wrapper type for MRG32k3a generator
                deriving  Show

instance RandomGen  MRGen  where
    genRange = let  mrg32k3a_m1 = ((2::Integer)^32 - 209)
               in   const  (0::Int, fromIntegral (mrg32k3a_m1 - 1))

    next (MRGen g0) = let  (v, g1) = MRG.next g0
                      in   ((fromIntegral v)::Int, MRGen g1)

    split (MRGen g0) = let  g1 = MRG.advance ((2::Integer)^96) g0
                       in   (MRGen g0, MRGen g1) 

mkMRGen :: Int -> MRGen
mkMRGen userSeed = let  longSeed = hashSeed userSeed
                        g0       =  MRG.seed longSeed
                   in   MRGen g0

ranSeek :: MRGen -> Integer -> MRGen
ranSeek (MRGen g0) count =  let  g1 = (MRG.advance count g0)  in   MRGen g1

hashSeed :: Int -> Integer
hashSeed userSeed =
    let str   = "MRG32k3a:" ++ (TP.printf "0x%x" userSeed)
        bytes =  (TSE.encodeUtf8 . TL.pack) $ str
        ints  = (map (fromIntegral) $ BS.unpack (SHA.hash bytes)) :: [Integer]
    in
        L.foldl'  (\acc d -> acc*256 + d)  0  (take 20 ints)

Function mkMRGen is similar to mkStdGen. Arbitrary access into the random sequence is provided by function ranSeek :: MRGen -> Integer -> MRGen in O(log n) time.

Side note: I am rehashing the user-provided seed in mkMRGen. this is because the github package uses its seed as just an offset into the random sequence. So in order to avoid the risk of sequence overlap for small user seeds, I need to generate a large number from the user seed.

Thanks to our RandomGen instance, we have access to usual functions such as random :: RandomGen g => g -> (a, g). For example, we can generate a 2D random field of type Double from a simple Int seed like this:

randomDoubleField :: Int -> (Int, Int) -> Double
randomDoubleField userSeed (x,y) =
    let  k  = 1  -- number of needed random values per plane point
         g0 = mkMRGen userSeed
         g1 = ranSeek  g0  (fromIntegral (k * cantor (x,y)))
    in   fst (random g1)

Now that we have that little toolkit, we can write a small test program, drawing some random landscape for a neighboorhood of point zero, with one character per 2D point.

Say, character 't' stands for one type of tree and 'T' for another type of tree. The absence of tree is denoted by a minus sign.

Main program:

randomCharField :: Int -> (Int, Int) -> Char
randomCharField  userSeed  (x,y) =
    let  n = floor (8.0 * randomDoubleField userSeed (x,y) )
    in   "------tT"  !!  n


rowString :: Int -> Int -> Int -> String
rowString userSeed size y =
               let  xRange = [(-size) .. size]
               in   map  (randomCharField userSeed)  [ (x,y) | x <- xRange ]


main = do
    let  userSeed = 42
         size     = 6
         yRange   = [(-size) .. size]
    mapM_  (putStrLn . (rowString userSeed size))  yRange

Program output:

--t-T----TT-t
------t-----T
-T--T--T-----
--t-T--tTTT--
--T--t---T---
t-Tt------t--
-T-----t-T---
-T-t-t----T--
tT-tT---tT--t
---TTt---t---
-------T---t-
--t---------t
-tT-t---t----

Optimization note: If performance is a concern, you probably want to move the (mkMRGen userSeed) computation out of the loops.

Thank you very much for your extensive answer. – kajman May 25 '20 at 15:43 — kajman, May 25 '20 at 15:43

score 2 · Accepted Answer · answered May 24 '20 at 12:26

You can avoid the unexpected behavior you see by using pcg-random instead of random :

import System.Random.PCG 
import Control.Monad.ST

randomFromSeed :: Int -> Int -> Int
randomFromSeed max seed = runST $ do
  g <- initialize (fromIntegral seed) 0
  uniformR (0, max - 1) g

main :: IO ()
main = print $ map (randomFromSeed 10) [1..20]

pcg-random has other nice properties too.