1

I have a financial dataset with monthly aggregates. I know the real world average for each measure.

I am trying to build some dummy transactional data using Python. I do not want the dummy transactional data to be entirely random. I want to model it around the real world averages I have.

Eg - If from the real data, the monthly total profit is $1000 and the total transactions are 5, then the average profit per transaction is $200. I want to create dummy transactions that are modelled around this real world average of $200. This is how I did it :

import pandas as pd
from random import gauss

bucket = []

for _ in range(5):
    value = [int(gauss(200,50))]
    bucket += value

transactions = pd.DataFrame({ 'Amount' : bucket})

Now, the challenge for me is that I have to randomize the identifiers too.

For eg, I know for a fact that there are three buyers in total. Let's call them A, B and C. These three have done those 5 transactions and I want to randomly assign them when I create the dummy transactional data. However, I also know that A is very likely to do a lot more transactions than B and C. To make my dummy data close to real life scenarios, I want to assign probabilities to the occurence of these buyers in my dummy transactional data.

Let's say I want it like this:

A : 60% appearance B : 20% appearance C : 20% appearance

How can I achieve this?

Nick Adams
  • 53
  • 6

1 Answers1

0

What you are asking is not a probability. You want a 100% chance of A having 60% chance of buying. For the same take a dict as an input that has a probability of each user buying. Then create a list with these probabilities on your base and randomly pick a buyer from the list. Something like below:

import random

#Buy percentages of the users
buy_percentage = {'A': 0.6, 'B': 0.2, 'C': 0.2}

#no of purchases
base = 100

buy_list = list()
for buyer, percentage in buy_percentage.items():
    buy_user = [buyer for _ in range(0, int(percentage*base))]
    buy_list.extend(buy_user)

for _ in range(0,base):
    #Randomly gets a buyer but makes sure that your ratio is maintained
    buyer = random.choice(buy_list)

    #your code to get buying price goes below

UPDATE:

Alternatively, the answer given in the below link can be used. This solution is better in my opinion.
A weighted version of random.choice

Subhrajyoti Das
  • 2,685
  • 3
  • 21
  • 36