Is there a way to assign probabilities to samples in a random number generator?

Question

I have a financial dataset with monthly aggregates. I know the real world average for each measure.

I am trying to build some dummy transactional data using Python. I do not want the dummy transactional data to be entirely random. I want to model it around the real world averages I have.

Eg - If from the real data, the monthly total profit is $1000 and the total transactions are 5, then the average profit per transaction is $200. I want to create dummy transactions that are modelled around this real world average of $200. This is how I did it :

import pandas as pd
from random import gauss

bucket = []

for _ in range(5):
    value = [int(gauss(200,50))]
    bucket += value

transactions = pd.DataFrame({ 'Amount' : bucket})

Now, the challenge for me is that I have to randomize the identifiers too.

For eg, I know for a fact that there are three buyers in total. Let's call them A, B and C. These three have done those 5 transactions and I want to randomly assign them when I create the dummy transactional data. However, I also know that A is very likely to do a lot more transactions than B and C. To make my dummy data close to real life scenarios, I want to assign probabilities to the occurence of these buyers in my dummy transactional data.

Let's say I want it like this:

A : 60% appearance B : 20% appearance C : 20% appearance

How can I achieve this?

you can use `random.choice(['A', 'A', 'A', 'B', 'C'])` - so 60% of this list is `A`, 20% of this list is `B`, 20% of this list is `C`. This way `A` will be selected more often. — furas, Nov 15 '19 at 07:32
@furas - Yes, I can but that's not scalable. What if I have a big list of customers and very precise probability numbers for each of them? — Nick Adams, Nov 15 '19 at 07:39
I couldn't find a less complex solution (like random.choices) when googling. Thanks! — Nick Adams, Nov 15 '19 at 07:49

Subhrajyoti Das · Accepted Answer · 2019-11-20T10:09:10.553

What you are asking is not a probability. You want a 100% chance of A having 60% chance of buying. For the same take a dict as an input that has a probability of each user buying. Then create a list with these probabilities on your base and randomly pick a buyer from the list. Something like below:

import random

#Buy percentages of the users
buy_percentage = {'A': 0.6, 'B': 0.2, 'C': 0.2}

#no of purchases
base = 100

buy_list = list()
for buyer, percentage in buy_percentage.items():
    buy_user = [buyer for _ in range(0, int(percentage*base))]
    buy_list.extend(buy_user)

for _ in range(0,base):
    #Randomly gets a buyer but makes sure that your ratio is maintained
    buyer = random.choice(buy_list)

    #your code to get buying price goes below

UPDATE:

Alternatively, the answer given in the below link can be used. This solution is better in my opinion.
A weighted version of random.choice

Is there a way to assign probabilities to samples in a random number generator?

1 Answers1