Suppose I have a dataframe with numerous columns and one of the columns is id
.
Suppose in a single function, I do a python groupby("id")
operations.
e.g.,
def func(df):
df["val1_cumsum"] = df.groupby("id")["val1"].cumsum()
df["val2_cumsum"] = df.groupby("id")["val2"].cumsum()
df["val3_cumsum"] = df.groupby("id")["val3"].cumsum()
Do the second and third groupby
calls actually do a full groupby
like the first one, or is there some native caching in python that says "we just did this, let's use the previous result?"
In other words is the above less performant than:
def func(df):
df_groupby_id = df.groupby("id")
df["val1_cumsum"] = df_groupby_id["val1"].cumsum()
df["val2_cumsum"] = df_groupby_id["val2"].cumsum()
df["val3_cumsum"] = df_groupby_id["val3"].cumsum()