0

I have a dataframe that has a different number of types and I want to create a subset where each type has an equal probability of being selected. For example say I have something like this

enter image description here

Now say I want to create a new dataframe of length 10 such that each type has an equal probability of being selected. How would I do this in Python? I was trying to follow this here but didn't get far.

Snorrlaxxx
  • 168
  • 1
  • 3
  • 18
  • Groupby has a sample function: [Docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html) – noah Nov 16 '20 at 17:29
  • 1
    @noah That works thanks! – Snorrlaxxx Nov 16 '20 at 17:36
  • 1
    When you figure out how to apply the docs for your example consider posting as an answer to your question for other users in the future – noah Nov 16 '20 at 19:04

1 Answers1

0

Following the docs here. We can simply do this

new_df = df.groupby("type").sample(n='enter number of samples', random_state=1)
Snorrlaxxx
  • 168
  • 1
  • 3
  • 18