I am using groupby on a pandas dataframe. I want to use something like a having clause to exclude empty lists

Question

I have these lines of code;

for name in clean.alpha_names:
    CategoryGroupLists=df_new_housing.groupby(name)['SALE PRICE'].apply(list)

When I run the code, the CategoryGroupLists contains content that looks like this;

I want to remove the empty list entries for FIELDSTON and FINANCIAL. I think I need something like a HAVING clause next to my GROUPBY, although PANDAS does not have a HAVING clause. So how do I do this?

EDIT: One suggestion is that I use a filter. However with the following code I get an error;

for name in clean.alpha_names:
    CategoryGroupLists=df_new_housing.groupby(name)['SALE PRICE'].filter(lambda x: len(x) > 1).apply(list)

TypeError: 'int' object is not iterable

EDIT2 My workaround for now is to use the following code. However I am sure there is a better way of doing it. Follow the original assignment with;

# Remove empty lists from the CategoryGroupLists
for idx in (CategoryGroupLists.index):
    if len(CategoryGroupLists[idx]) == 0:
        del CategoryGroupLists[idx]

I think you are looking for **filter** as described by this answer: https://stackoverflow.com/questions/22105452/what-is-the-equivalent-of-sql-group-by-having-on-pandas — ScottC, Dec 06 '22 at 14:38
Thank you for that @ScottC. I applied a filter (see the question) and I get a new error. — arame3333, Dec 06 '22 at 14:46

score 0 · Answer 1 · answered Dec 06 '22 at 16:09

0

I think the best way to go about this is to first filter out the names that do not have sales, and then perform the groupby.

For example let us say you had the following dataframe:

df = pd.DataFrame({'name':['FLATIRON','FLATIRON','FLATIRON','FLATIRON','FINANCIAL','FIELDSTON'],
                   'SALE PRICE':[489854,576299,6625000,4650000,'','']})

We now filter out the names without data, and then perform the groupby.

# Filter out blank sales
df_with_sales = df[df['SALE PRICE'] != '']

# Group sales by name (and generate list)
CategoryGroupLists = df_with_sales.groupby('name')['SALE PRICE'].apply(list)

print(CategoryGroupLists)

FLATIRON    [489854, 576299, 6625000, 4650000]

answered Dec 06 '22 at 16:09

ScottC

3,941
1
6
20

I wish I could explain to you why this does not work but I can't. There are no null or zero SALE PRICE entries in the dataset because I previously removed them, and I also checked before running this code. I should not get any empty lists in the first place then? Yes you are right. But I do anyway. – arame3333 Dec 06 '22 at 16:25
what does your dataframe look like for `FINANCIAL` and `FIELDSTON`? – ScottC Dec 06 '22 at 22:21

I am using groupby on a pandas dataframe. I want to use something like a having clause to exclude empty lists

1 Answers1