I am trying to groupby a huge dataframe (3.5 Billion observations) by two columns and multiply the resulting columns 2 by 2 as follows:
FirstNeighborVars_s2=dat2.groupby(by=['NiuCust2', 'year']).agg(
s2_nv_importing=('NVCost2_sum', lambda x: (x * dat2.loc[x.index, 'importing'])),
s2_prop_importing=('PROPCost2_sum', lambda x: (x * dat2.loc[x.index, 'importing']))
).reset_index()
Now, while this works with a smaller version of the database (when dat2 is defined as dat2.head(10000)), this does not work with the version using the entire database (the one in the code above) giving the following error:
ValueError: Must produce aggregated value
Why does this error arise? Is there another way to perform the following series of operations (which does not work actually because in pandas we cannot operate on Groupby dataframes):
FirstNeighborVars_s2=dat2.groupby(by=['NiuCust2', 'year'])
FirstNeighborVars_s2["s2_nv_importing"]=FirstNeighborVars_s2["NVCost2_sum"]*FirstNeighborVars_s2["importing"]
FirstNeighborVars_s2["s2_prop_importing"]=FirstNeighborVars_s2["PROPCost2_sum"]*FirstNeighborVars_s2["importing"]
Thanks a lot