Hash-based selection, based on choices and probabilities

Question

I have a list of choices in python like:

choices = ['A', 'B', 'C', 'D']

and a list of probabilities like:

probs = [0.5, 0.2, 0.1, 0.1]

they're related position-wise, A has a probability of 0.5, while D has a probability of 0.1.

I also have a list with ids:

list_ids = range(1000)

I want to obtain, for each element of the list, an element of the list of choices, something that could be done like:

import numpy as np
variants = np.random.choice(choices, len(list_ids), p=probs, replace=True)

However, I would like to have a deterministic mapping from the ids in the list to the choices, to be able to recover them from any computer at any moment. For that, I would use some kind of hashing function, something like:

hashes = [xxhash.xxh64(str(id)).intdigest() for id in list_ids]

and then do some operations on the hashes to guarantee that 50% of the samples go to A. How can I do that in python?

Thanks!

NOTE: Setting numpy the seed doesn't solve it for me, I'd like someone who doesn't even use python be able to know, for a given id, in which choice it was allocated according to the deterministic procedure.

`seed` the random number generator, the output will be reproducible on any computer. — mozway, Oct 24 '22 at 07:25
I'd like it to be able to be reproduced even outside of non-python environments, if we agree about the hash we should all get the same results — David Masip, Oct 24 '22 at 07:27
Have a look at this: https://medium.com/deliberate-data-science/experimentation-platform-in-a-day-c60646ef1a2 — David Masip, Oct 24 '22 at 07:27
Experimentation platforms often use hashes instead of setting seeds — David Masip, Oct 24 '22 at 07:28
It seems that you actually want to build a generator of random numbers with probabilities. Why not looking into numpy code then and take the implementation from there? — Nikolay Zakirov, Oct 24 '22 at 07:31
Yeah I'm doing that but it's super hard to understand the implementation of numpy — David Masip, Oct 24 '22 at 07:32
Then find out what type it is and also look up its description in https://en.wikipedia.org/wiki/List_of_random_number_generators - in fact you can pick generator yourself, like a simple LCG. — tevemadar, Oct 24 '22 at 07:35
Python uses https://en.wikipedia.org/wiki/Mersenne_Twister this one by default. You can see there are multipl implementations in all languages and there is an implementation right there in pseudo code. If you port this implementation and use same seed, you will have same number — Nikolay Zakirov, Oct 24 '22 at 07:41
You could take this implementation, it looks not complicated and reimplement in any language https://github.com/yinengy/Mersenne-Twister-in-Python/blob/master/RandomClass.py — Nikolay Zakirov, Oct 24 '22 at 07:45

Hash-based selection, based on choices and probabilities

0 Answers0