I have not been able to come up with a better title, it's a really simple issue though, I just don't know what to call it exactly.
I have a database of horses simplified here:
horse_name | stable_name |
---|---|
Horse1 | Stable1 |
I am only interested in further analyzing records which feature stables that own many horses so I wanted to filter out the small stables (ones with less than 10 horses).
What I've tried:
Attempt 1:
Step 1: df['Stable'].value_counts() > 10
-> gives me boolean values, I inteded to use this to only query the part of the database that satisfied this condition.
Step 2: df[df['Stable'].value_counts() > 10]
-> I wrap this in another df, hoping I get the result that I want, but I don't, I get a key error.
Attempt 2:
Step 1: df['Stable'].value_counts().sort_values(ascending=False).head(21)
-> a little clunky, but by trial and error, I figured out there are 21 stables with more than 10 horses, and this query returned just those stables. All I needed now is to filter the database out using this result.
Step 2: df[df['Stable'].value_counts().sort_values(ascending=False).head(21)]
-> same issue, returns a key error.
I also tried: df[df['Stable'] in df['Stable'].value_counts() > 10]
again, that didn't work, and I don't think I'll sleep today.
Can anyone explain why this is happening in a way that I can understand? And how should this be done instead?