Generate column in pandas dataframe with specific frequency from a list

Question

I am trying to create a dataframe with 4 columns 'date', 'age', 'conversion', 'marital_status'. Where marital status is one of 4 choices (married, divorced, single, unknown). I am able to create the dataframe using the following code. However, I am not sure how to specify the frequency. I want married to be 50%, divorced 30%, single 15% and rest unknown. How do I do this.

import pandas as pd
import numpy as np
import random

random.seed(30)
np.random.seed(30)

start_date,end_date = '1/1/2015','12/31/2019'
date_rng = pd.date_range(start= start_date, end=end_date, freq='D')
length_of_field = date_rng.shape[0]
df = pd.DataFrame(date_rng, columns=['date'])
df['age'] = np.random.randint(18,100,size=(len(date_rng)))
df['conversion'] = np.random.randint(0,2,size=(len(date_rng)))
marital_status = ('divorced','married','single','unknown')
group_1 = [random.choice(marital_status) for _ in range(length_of_field)]
df['marital_status'] = group_1
print('\ndf:')
print(df)

https://stackoverflow.com/questions/3679694/a-weighted-version-of-random-choice?noredirect=1&lq=1 — AMC, Feb 08 '20 at 03:13

score 1 · Answer 1 · answered Feb 07 '20 at 20:56

1

You can use numpy.random.choice. p parameter specifies the probability of each class.

import numpy as np
np.random.choice(marital_status, len(length_of_field), p = [0.3, 0.5, 0.15, 0.5])

answered Feb 07 '20 at 20:56

cmxu

954
5
13

score 1 · Answer 2 · answered Feb 07 '20 at 21:04

1

Try:

np.random.choice(['divorced','maried','single','unknown'], size = len(date_rng), p = [0.5, 0.3,0.15,0.05])

answered Feb 07 '20 at 21:04

Petar Atanasov

21
2

score 0 · Answer 3 · answered Feb 07 '20 at 20:58

0

You can use random.choices (inspired by this question):

marital_status = random.choices(
    population=['divorced','married','single','unknown'],
    weights=[0.3, 0.5, 0.15, 0.05],
    k=df.shape[0]
)
df['marital_status'] = marital_status

answered Feb 07 '20 at 20:58

XavierBrt

1,179
8
13

Generate column in pandas dataframe with specific frequency from a list

3 Answers3