I have a financial dataset with monthly aggregates. I know the real world average for each measure.
I am trying to build some dummy transactional data using Python. I do not want the dummy transactional data to be entirely random. I want to model it around the real world averages I have.
Eg - If from the real data, the monthly total profit is $1000 and the total transactions are 5, then the average profit per transaction is $200. I want to create dummy transactions that are modelled around this real world average of $200. This is how I did it :
import pandas as pd
from random import gauss
bucket = []
for _ in range(5):
value = [int(gauss(200,50))]
bucket += value
transactions = pd.DataFrame({ 'Amount' : bucket})
Now, the challenge for me is that I have to randomize the identifiers too.
For eg, I know for a fact that there are three buyers in total. Let's call them A, B and C. These three have done those 5 transactions and I want to randomly assign them when I create the dummy transactional data. However, I also know that A is very likely to do a lot more transactions than B and C. To make my dummy data close to real life scenarios, I want to assign probabilities to the occurence of these buyers in my dummy transactional data.
Let's say I want it like this:
A : 60% appearance B : 20% appearance C : 20% appearance
How can I achieve this?