Selecting rows based on a '>' condition of the iteration of one of the columns

Question

I have not been able to come up with a better title, it's a really simple issue though, I just don't know what to call it exactly.

I have a database of horses simplified here:

horse_name	stable_name
Horse1	Stable1

I am only interested in further analyzing records which feature stables that own many horses so I wanted to filter out the small stables (ones with less than 10 horses).

What I've tried:

Attempt 1:

Step 1: df['Stable'].value_counts() > 10 -> gives me boolean values, I inteded to use this to only query the part of the database that satisfied this condition.

Step 2: df[df['Stable'].value_counts() > 10] -> I wrap this in another df, hoping I get the result that I want, but I don't, I get a key error.

Attempt 2:

Step 1: df['Stable'].value_counts().sort_values(ascending=False).head(21) -> a little clunky, but by trial and error, I figured out there are 21 stables with more than 10 horses, and this query returned just those stables. All I needed now is to filter the database out using this result.

Step 2: df[df['Stable'].value_counts().sort_values(ascending=False).head(21)] -> same issue, returns a key error.

I also tried: df[df['Stable'] in df['Stable'].value_counts() > 10] again, that didn't work, and I don't think I'll sleep today.

Can anyone explain why this is happening in a way that I can understand? And how should this be done instead?

Does this answer your question? [Pandas: Selecting rows based on value counts of a particular column](https://stackoverflow.com/questions/36166090/pandas-selecting-rows-based-on-value-counts-of-a-particular-column) — Ignatius Reilly, Oct 19 '22 at 00:09
Yes Naveed, it worked immediately, didn't even need to change anything — lushyplushy, Oct 19 '22 at 10:17

Naveed · Accepted Answer · 2022-10-19T00:19:42.787

Try this out If you post the data, I would have been able to share the result

# create a temporary column 'c' by grouping on stable
# transform associates the result to all rows that are part of groupby
# we then check for 'c' and where its value is true

(df[df.assign(c=df.groupby(['Stable'],as_index=False)['Stable']
              .transform(lambda x: x.count() < 10) )['c']
    .eq(True)])

or

# create a temporary column 'c' by grouping on stable
# and assiging count to it using transform
# finally, using loc to select rows that has a count fewer than 10

(df.loc[df.assign(
    c=df.groupby(['stable'])['stable']
    .transform('count'))['c']<10]
)

score 0 · Answer 2 · answered Oct 19 '22 at 00:03

0

.value_counts() returns a series where it counts the unique values of the values in the column.

Try this:

df[df['Stable'] > 10]

answered Oct 19 '22 at 00:03

Mudassir

11
1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 20 '22 at 23:56

Selecting rows based on a '>' condition of the iteration of one of the columns

2 Answers2

Linked