I am trying to create a dataframe with 4 columns 'date', 'age', 'conversion', 'marital_status'. Where marital status is one of 4 choices (married, divorced, single, unknown). I am able to create the dataframe using the following code. However, I am not sure how to specify the frequency. I want married to be 50%, divorced 30%, single 15% and rest unknown. How do I do this.
import pandas as pd
import numpy as np
import random
random.seed(30)
np.random.seed(30)
start_date,end_date = '1/1/2015','12/31/2019'
date_rng = pd.date_range(start= start_date, end=end_date, freq='D')
length_of_field = date_rng.shape[0]
df = pd.DataFrame(date_rng, columns=['date'])
df['age'] = np.random.randint(18,100,size=(len(date_rng)))
df['conversion'] = np.random.randint(0,2,size=(len(date_rng)))
marital_status = ('divorced','married','single','unknown')
group_1 = [random.choice(marital_status) for _ in range(length_of_field)]
df['marital_status'] = group_1
print('\ndf:')
print(df)