0

I have not been able to come up with a better title, it's a really simple issue though, I just don't know what to call it exactly.

I have a database of horses simplified here:

horse_name stable_name
Horse1 Stable1

I am only interested in further analyzing records which feature stables that own many horses so I wanted to filter out the small stables (ones with less than 10 horses).

What I've tried:

Attempt 1:

Step 1: df['Stable'].value_counts() > 10 -> gives me boolean values, I inteded to use this to only query the part of the database that satisfied this condition.

Step 2: df[df['Stable'].value_counts() > 10] -> I wrap this in another df, hoping I get the result that I want, but I don't, I get a key error.

Attempt 2:

Step 1: df['Stable'].value_counts().sort_values(ascending=False).head(21) -> a little clunky, but by trial and error, I figured out there are 21 stables with more than 10 horses, and this query returned just those stables. All I needed now is to filter the database out using this result.

Step 2: df[df['Stable'].value_counts().sort_values(ascending=False).head(21)] -> same issue, returns a key error.

I also tried: df[df['Stable'] in df['Stable'].value_counts() > 10] again, that didn't work, and I don't think I'll sleep today.

Can anyone explain why this is happening in a way that I can understand? And how should this be done instead?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

2 Answers2

1

Try this out If you post the data, I would have been able to share the result

# create a temporary column 'c' by grouping on stable
# transform associates the result to all rows that are part of groupby
# we then check for 'c' and where its value is true

(df[df.assign(c=df.groupby(['Stable'],as_index=False)['Stable']
              .transform(lambda x: x.count() < 10) )['c']
    .eq(True)])

or

# create a temporary column 'c' by grouping on stable
# and assiging count to it using transform
# finally, using loc to select rows that has a count fewer than 10

(df.loc[df.assign(
    c=df.groupby(['stable'])['stable']
    .transform('count'))['c']<10]
)

Naveed
  • 11,495
  • 2
  • 14
  • 21
0

.value_counts() returns a series where it counts the unique values of the values in the column.

Try this:

df[df['Stable'] > 10]

Mudassir
  • 11
  • 1
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 20 '22 at 23:56