1

I have this datafarme:

df = pd.DataFrame(
    {'cn':[1,1,1,1,2,2,2], 'date': ['01/10/2017', '02/09/2016', '02/10/2016','01/20/2017', '05/15/2017', '02/10/2016', '02/10/2018'],
     'score':[4,10,6, 5, 15, 7, 8]})

    cn  date    score
0   1   01/10/2017  4
1   1   02/09/2016  10
2   1   02/10/2016  6
3   1   01/20/2017  5
4   2   05/15/2017  15
5   2   02/10/2016  7
6   2   02/10/2018  8

I have these two functions:

def total_count_phq9_BOF_activation (grf):
    s = grf.score.count()
    return s

def first_phq9_BOF_activation (grf):
    value =grf[grf.score==grf.score.max()].date
    return value

I used this solution (1) to use these two functions for the apply method:

df.groupby('cn').apply (lambda x: pd.Series({"first_phq9_BOF_activation": first_phq9_BOF_activation , "total_count_phq9_BOF_activation": total_count_phq9_BOF_activation}))

But it did not work. Would you please let me know what part of my code is wrong?

Apply multiple functions to multiple groupby columns

Mary
  • 1,142
  • 1
  • 16
  • 37

1 Answers1

0

You didn't call function total_count_phq9_BOF_activation and first_phq9_BOF_activation inside Series constructor.They are not part of apply. They belong to series constructor, so you need specifically call them with (x)

df.groupby('cn').apply (lambda x: pd.Series({"first_phq9_BOF_activation": first_phq9_BOF_activation(x) , 
                                             "total_count_phq9_BOF_activation": total_count_phq9_BOF_activation(x)}))

Out[157]:
                    first_phq9_BOF_activation  total_count_phq9_BOF_activation
cn
1   1    02/09/2016
Name: date, dtype: object                                4
2   4    05/15/2017
Name: date, dtype: object                                3
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • L, thank you. the function "first_phq9_BOF_activation" also prints other things "date, dtype, ..." how can change the function so that it do not print it. – Mary Nov 13 '19 at 04:00
  • it returns `s` and `s` is a series itself. `Name: date, dtype:...` is part of every series. In the current implementation of `first_phq9_BOF_activation`, you can't get rid of them. Perhaps, there is another way, but you need editing and adding your desired output to the question to show what output you want – Andy L. Nov 13 '19 at 04:18
  • 1
    L, I fixed this function (def first_phq9_BOF_activation (grf): value =grf[grf.score==grf.score.max()].date return value) by adding min() to the date. So it is def first_phq9_BOF_activation (grf): value =grf[grf.score==grf.score.max()].date.min() return value) – Mary Nov 13 '19 at 16:36