python pandas custom agg function

Question

Dataframe:
  one two
a  1  x
b  1  y
c  2  y
d  2  z
e  3  z

grp = DataFrame.groupby('one')
grp.agg(lambda x: ???) #or equivalent function

Desired output from grp.agg:

one two
1   x|y
2   y|z
3   z

My agg function before integrating dataframes was "|".join(sorted(set(x))). Ideally I want to have any number of columns in the group and agg returns the "|".join(sorted(set()) for each column item like two above. I also tried np.char.join().

Love Pandas and it has taken me from a 800 line complicated program to a 400 line walk in the park that zooms. Thank you :)

Zelazny7 · Accepted Answer · 2013-01-09T22:06:38.373

16

You were so close:

In [1]: df.groupby('one').agg(lambda x: "|".join(x.tolist()))
Out[1]:
     two
one
1    x|y
2    y|z
3      z

Expanded answer to handle sorting and take only the set:

In [1]: df = DataFrame({'one':[1,1,2,2,3], 'two':list('xyyzz'), 'three':list('eecba')}, index=list('abcde'), columns=['one','two','three'])

In [2]: df
Out[2]:
   one two three
a    1   x     e
b    1   y     e
c    2   y     c
d    2   z     b
e    3   z     a

In [3]: df.groupby('one').agg(lambda x: "|".join(x.order().unique().tolist()))
Out[3]:
     two three
one
1    x|y     e
2    y|z   b|c
3      z     a

edited Jan 09 '13 at 22:06

answered Jan 09 '13 at 21:42

Zelazny7

39,946
18
70
84

Awesome. I was hacking out the aweful `grp2.agg(lambda x: u"|".join(sorted(set(map(str, x.tolist())))))`. Thanks for showing me the ropes on using arrays for real! Where is a good reference? Thanks again. – brian_the_bungler Jan 09 '13 at 22:48
Honestly, Ipython and experimenting with code snippets has done more for my understanding than any one resource. But Wes McKinney's Python for Data Analysis is a great reference. – Zelazny7 Jan 09 '13 at 23:03
I have been reading the book since Dec but still lots to practice. FYI I took a look at some of your HDF5 store questions, I ran into same flexibility problems with it. I work with 3 million row data sets with 60 columns, lots of text and MongoDB has been a champ. – brian_the_bungler Jan 10 '13 at 03:59
Would you mind sharing some of your mongoDB code and how you use it with pandas? I am trying to nail down a consistent workflow for using pandas with very large datasets (but not 'big' data). I can ask a proper SE question' too if you like. I also thought of one more resource: Wes's 2012 pycon tutorial. It was very thorough and helped cement several concepts for me. – Zelazny7 Jan 10 '13 at 12:51
I would be glad to post it but I think a question format is the way to go. It would be neat to see what others have to say too. I will have time this weekend to do it justice. – brian_the_bungler Jan 10 '13 at 15:18
Thanks, I created a question here: http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas – Zelazny7 Jan 10 '13 at 16:23
in pandas version 1.3.1, .sort() should be replaced with .sort_values() – KH Kim Aug 01 '21 at 06:19

score 2 · Answer 2 · answered Jul 09 '19 at 21:50

Just an elaboration on the accepted answer:

df.groupby('one').agg(lambda x: "|".join(x.tolist()))

Note that the type of df.groupby('one') is SeriesGroupBy. And the function agg defined on this type. If you check the documentation of this function, it says its input is a function that works on Series. This means that x type in the above lambda is Series.

Another note is that defining the agg function as lambda is not necessary. If the aggregation function is complex, it can be defined separately as a regular function like below. The only constraint is that the x type should be of Series (or compatible with it):

def myfun1(x):
    return "|".join(x.tolist())

and then:

df.groupby('one').agg(myfun1)

score 1 · Answer 3 · answered Dec 15 '17 at 11:57

1

There is a better way to concatenate strings, in pandas documentation.
So I prefer this way:

In [1]: df.groupby('one').agg(lambda x: x.str.cat(sep='|'))
Out[1]:
     two
one
1    x|y
2    y|z
3      z

answered Dec 15 '17 at 11:57

Lahiru Karunaratne

2,020
16
18

python pandas custom agg function

3 Answers3

Linked