Excluding groups from dataframe based on count in pandas

Question

                             Attribute
                                 count
Crop Type     Harvest Season
Barley        Spring                25
Corn (Grain)  Spring               655
              Winter                 1
Corn (Silage) Spring                 6
Cotton        Spring                 5
Peas          Spring                 3
Canola        Spring               169
              Winter               164
Soybeans      Spring               541
              Winter                 2
Sugar beet    Spring                82
Sunflower     Spring               637
              Winter                 1
Wheat         Spring               253
              Winter               451

I have a dataframe df where I counted the number of each group based on the foll. code:
df[['Crop Type', 'Harvest Season']].groupby(['Crop Type', 'Harvest Season']).agg(['count'])

The output of this code is the dataframe above.

How can I exclude groups from the original dataframe where the count is less than 30?

Does this answer your question? [How to select rows from a DataFrame based on column values](https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values) — The Grand J, Nov 16 '20 at 05:04
If you look at the multiple conditions section of the first answer just use half of it to check for < 30 — The Grand J, Nov 16 '20 at 05:06
No it does not answer, I am using the results of a count on a groupby operation to exclude rows from the original dataframe — user308827, Nov 16 '20 at 05:06

score 2 · Accepted Answer · answered Nov 16 '20 at 05:10

2

You can change logic - select all rows if count of groups is greater or same like 30 by Series.ge in GroupBy.transform for repeat aggregate counts in Series with same size like original DataFrame, so possible filter by boolean indexing:

df[df.groupby(['Crop Type', 'Harvest Season'])['Crop Type'].transform('count').ge(30)]

answered Nov 16 '20 at 05:10

jezrael

822,522
95
1,334
1,252

thanks! Why did you do `df[df.groupby(['Crop Type', 'Harvest Season'])['Crop Type'].transform('count').ge(30)]` and not `df[df.groupby(['Crop Type', 'Harvest Season'])['Crop Type', 'Harvest Season'].transform('count').ge(30)]` – user308827 Nov 16 '20 at 05:24
@user308827 - Because if use `df.groupby(['Crop Type', 'Harvest Season'])['Crop Type', 'Harvest Season'].transform('count')` get output 2 columns Dataframe, for filter is necessary only `Series`. So if use `df.groupby(['Crop Type', 'Harvest Season'])['Crop Type', 'Harvest Season'].transform('count').ge(30)` get boolean DataFrame with same values in rows (if no missing data, because `count` exclude them) – jezrael Nov 16 '20 at 05:30

Excluding groups from dataframe based on count in pandas

1 Answers1