Piggy backing off my own previous question python pandas: assign control vs. treatment groupings randomly based on %
Thanks to @maxU, I know how to assign random control/treatment groupings to 2 groups; but what if I have 3 groups or more?
For example:
df.head()
customer_id | Group | many other columns
ABC 1
CDE 3
BHF 2
NID 1
WKL 3
SDI 2
JSK 1
OSM 3
MPA 2
MAD 1
pd.pivot_table(df,index=['Group'],values=["customer_id"],aggfunc=lambda x: len(x.unique()))
Group 1 : 270
Group 2 : 180
Group 3 : 330
I have a great answer, when I only have two groups:
df['Flag'] = df.groupby('Group')['customer_id']\
.transform(lambda x: np.random.choice(['Control','Test'], len(x),
p=[.5,.5] if x.name==1 else [.4,.6]))
But what if i want to split it this way:
- Group 1: 50% Control & 50% Test
- Group 2: 40% Control & 60% Test
- Group 3: 20% Control & 80% Test
@MaxU's answer is great, but unfortunately the split is not exact
d = {1:[.5,.5], 2:[.4,.6], 3:[.2,.8]}
df['Flag'] = df.groupby('Group')['customer_id'] \
.transform(lambda x: np.random.choice(['Control','Test'], len(x), p=d[x.name]))
When i test it, I don't get exact splits.
pd.pivot_table(df,index=['Group'],values=["customer_id"],columns=['Flag'], aggfunc=lambda x: len(x.unique()))
Control Treatment
Group 1: 138 132
Group 2: 78 102
Group 3: 79 251
Group 1 should be 135/135.