I want to replace an existing random number based data generator (in Python) with a hash based one so that it no longer needs to generate everything in sequence, as inspired by this article.
I can create a float from 0 to 1 by taking the integer version of the hash and dividing it by the maximum value of a hash.
I can create a flat integer range by taking the float and multiplying by the flat range. I could probably use modulo and live with the bias, as the hash range is large and my flat ranges are small.
How could I use the hash to create a gaussian or normal distributed floating point value?
For all of these cases, would I be better off just using my hash as a seed for a new random.Random object and using the functions in that class to generate my numbers and rely on them to get the distribution characteristics right?
At the moment, my code is structured like this:
num_people = randint(1,100)
people = [dict() for x in range(num_people)]
for person in people:
person['surname'] = choice(surname_list)
person['forename'] = choice(forename_list)
The problem is that for a given seed to be consistent, I have to generate all the people in the same order, and I have to generate the surname then the forename. If I add a middle name in between the two then the generated forenames will change, as will all the names of all the subsequent people.
I want to structure the code like this:
h1_groupseed=1
h2_peoplecount=1
h2_people=2
h4_surname=1
h4_forename=2
num_people = pghash([h1_groupseed,h2_peoplecount]).hashint(1,100)
people = [dict() for x in range(num_people)]
for h3_index, person in enumerate(people,1):
person['surname'] = surname_list[pghash([h1_groupseed,h2_people,h3_index,h4_surname]).hashint(0, num_of_surnames - 1)]
person['forename'] = forename_list[pghash([h1_groupseed,h2_people,h3_index,h4_forename]).hashint(0, num_of_forenames - 1)]
This would use the values passed to pghash to generate a hash, and use that hash to somehow create the pseudorandom result.