randomly sample to create new dataframe where each there is equal probability dependent on a column python

Question

I have a dataframe that has a different number of types and I want to create a subset where each type has an equal probability of being selected. For example say I have something like this

Now say I want to create a new dataframe of length 10 such that each type has an equal probability of being selected. How would I do this in Python? I was trying to follow this here but didn't get far.

Groupby has a sample function: [Docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html) — noah, Nov 16 '20 at 17:29
When you figure out how to apply the docs for your example consider posting as an answer to your question for other users in the future — noah, Nov 16 '20 at 19:04

score 0 · Accepted Answer · answered Nov 16 '20 at 20:38

0

Following the docs here. We can simply do this

new_df = df.groupby("type").sample(n='enter number of samples', random_state=1)

answered Nov 16 '20 at 20:38

Snorrlaxxx

168
1
3
18

randomly sample to create new dataframe where each there is equal probability dependent on a column python

1 Answers1