Python: How to remove dict entries based on number of rows in each entry

Question

I have created a Dictionary of small Dataframes from one large Dataframe by grouping them based on a column value using;

dict1 = {k: v for k, v in df.groupby('Some Column Name')}

I want to pass these to a second Dictionary and drop Dataframes based on the number of rows in them. For example, any Dataframes with less than 20 rows should be ignored.

I can drop them based on values like this but can't find a way to reference the row numbers directly;

dict2 = {k: v for k, v in dict1.items() if v[0] <=20}

Any help is appreciatted, thanks.

Use `len`. `dict2 = {k: v for k, v in dict1.items() if len(v) >=20}` — James, Nov 18 '19 at 10:55

score 1 · Accepted Answer · answered Nov 18 '19 at 11:10

You can use df.shape[0] and use code you've already wrote. The first value in a df.shape is the number of rows and the second one, the number of columns

dict2 = {k: v for k, v in dict1.items() if df.shape[0]<21}

I've created this small dataframe to show you this:

The dataframe is:

df = pd.DataFrame([['ch',2],['bt',4],['ch',10],['bt',5],['aw',3]],columns=['code','freq'])
print(df.shape)
(5,2)  #5 rows is the upper limit

dict1 = {k: v for k, v in df.groupby('code') if df.shape[0]>2}  
print(dict1)

{'aw':   code  freq
4   aw     3, 'bt':   code  freq
1   bt     4
3   bt     5, 'ch':   code  freq
0   ch     2
2   ch    10}

dict1 = {k: v for k, v in df.groupby('code') if df.shape[0]>6}
print(dict1)
{}

Python: How to remove dict entries based on number of rows in each entry

1 Answers1