Trying to use apply-split-combine pandas transform. With the twist that the apply function needs to operate on multiple columns. It seems I can't get it to work using pd.transform
and have to go indirect via pd.apply
. There a way to do
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date':[1,1,1,2,2,2],'col1':[1,2,3,4,5,6],'col2':[1,2,3,4,5,6]})
col1 = 'col1'
col2 = 'col2'
def calc(dfg):
nparray = np.array(dfg[col1])
somecalc = np.array(dfg[col2])
# do something with somecalc that helps caculate result
return(nparray - nparray.mean()) #just some dummy data, the function does a complicated calculation
#===> results in: KeyError: 'col1'
df['colnew'] = df.groupby('Date')[col1].transform(calc)
#===> results in: ValueError: could not broadcast input array from shape (9) into shape (9,16) or TypeError: cannot concatenate a non-NDFrame object
df['colnew'] = df.groupby('Date').transform(calc)
#===> this works but feels unnecessary
def applycalc(df):
df['colnew'] = calc(df)
return(df)
df = df.groupby('Date').apply(applycalc)
This post is the closest I found. I would prefer to not pass in all the columns as separate parameters, besides the fact that there is a groupby operation.
EDIT: Note that I'm not really trying to calculate nparray - nparray.mean()
that's just a dummy calculation. It does something complicated which returns an array of shape (group_length,1)
. Also I want to store colnew
as a new column in the original dataframe.