I am looking for a way to integrate more than one apply function from my raw data. Here is some simplified code.
import pandas as pd
df = pd.DataFrame({'name':["alice","bob","charlene","alice","bob","charlene","alice","bob","charlene","edna" ],
'date':["2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-02","2020-01-01","2020-01-02","2020-01-01"],
'contribution': [5,5,10,20,30,1,5,5,10,100],
'payment-type': ["cash","transfer","cash","transfer","cash","transfer","cash","transfer","cash","transfer",]})
df['date'] = pd.to_datetime(df['date'])
def myfunction(input):
output = input["name"].value_counts()
output.index.set_names(['name_x'], inplace=True)
return output
daily_count = df.groupby(pd.Grouper(key='date', freq='1D')).apply(myfunction)
print(daily_count.reset_index())
output:
date name_x name
0 2020-01-01 bob 3
1 2020-01-01 charlene 2
2 2020-01-01 alice 2
3 2020-01-01 edna 1
4 2020-01-02 charlene 1
5 2020-01-02 alice 1
I would like to integrate the output from this code into the previous result.
def myfunction(input):
output = input["contribution"].sum()
# output.index.set_names(['name_x'], inplace=True)
return output
daily_count = df.groupby([pd.Grouper(key='date', freq='1D'), "name"]).apply(myfunction)
Which would give me something like:
date name num_contrubutions total_pp
0 2020-01-01 bob 3 25
1 2020-01-01 charlene 2 40
2 2020-01-01 alice 2 11
3 2020-01-01 edna 1 100
4 2020-01-02 charlene 1 5
5 2020-01-02 alice 1 10
It's important for me to use apply() because I plan to do some API calls and database lookups in the functions.
ta, Andrew