22

I am using pandas groupby and want to apply the function to make a set from the items in the group.

The following results in TypeError: 'type' object is not iterable:

df = df.groupby('col1')['col2'].agg({'size': len, 'set': set})

But the following works:

def to_set(x):
    return set(x)
    
df = df.groupby('col1')['col2'].agg({'size': len, 'set': to_set})

In my understanding the two expression are similar, what is the reason why the first does not work?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
hangc
  • 4,730
  • 10
  • 33
  • 66

4 Answers4

21

Update

  • As late as pandas version 0.22, this is an issue.
  • As of pandas version 1.1.2, this is not an issue. Aggregating set, doesn't result in TypeError: 'type' object is not iterable.
    • Not certain when the functionality was updated.

Original Answer

It's because set is of type type whereas to_set is of type function:

type(set)
<class 'type'>

def to_set(x):
    return set(x)

type(to_set)

<class 'function'>

According to the docs, .agg() expects:

arg : function or dict

Function to use for aggregating groups.

  • If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.
  • If passed a dict, the keys must be DataFrame column names.

Accepted Combinations are:

  • string cythonized function name
  • function
  • list of functions
  • dict of columns -> functions
  • nested dict of names -> dicts of functions
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Stefan
  • 41,759
  • 13
  • 76
  • 81
  • For completeness, it raises an error `TypeError: 'type' object is not iterable` and it's probably because if you don't pass a *function* it expects a list of functions. – ayhan Jun 01 '16 at 15:27
11

Try using:

df = df.groupby('col1')['col2'].agg({'size': len, 'set': lambda x: set(x)})

Works for me.

Animesh Mishra
  • 131
  • 1
  • 4
  • 5
    If you toss a one-liner like you just did you have to include context to explain what you did and why this works. Follow the example of user Stefan will provide you more sympathy and upvotes than just the one-liner. You have to make it understandable for future readers so they can learn from it. – ZF007 Jun 27 '19 at 08:20
  • 2
    `'set': lambda x: set(x)` can be replaced with `'set': set` – Vlas Sokolov Jun 16 '21 at 12:42
5

Update for newer versions of Pandas if you get the following error

SpecificationError: nested renamer is not supported
df = df.groupby('col1')['col2'].agg(size= len, set= lambda x: set(x))
kindjacket
  • 1,410
  • 2
  • 15
  • 23
0

Update for Pandas version 1.3.3 if using .agg({'set': set}) produces the following error:

TypeError: Unable to infer the type of the field set

This persists if simply using the previously suggested solution of .agg({'set': lambda x: set(x)})

The reason for this is that set does not fulfil is_list_like in _aggregate (detailed explanation here, courtesy of @EdChum)

A solution is therefore to coerce it to a list using:

.agg({'set': lambda x: list(set(x))})

tomlincr
  • 71
  • 1
  • 4
  • First, `.agg({'set': lambda x: list(set(x)}))` does not have matched parenthesis. But if I fix to `.agg({'set': lambda x: list(set(x))})` I still get "SpecificationError: nested renamer is not supported" – InnocentBystander Oct 28 '22 at 04:54