df = pd.DataFrame({'x':[1,2,3,4,5,6],'y':[7,8,9,10,11,12],'z':['a','a','a','b','b','b']})
i = pd.Index([0,3,5,10,20])
The indices in i
are from a larger dataframe, and df
is a subset of that larger dataframe. So there will be indices in i
that will not be in df
. When I do
df.groupby('z').aggregate({'y':lambda x: sum(x.loc[i])}) #I know I can just use .aggregate({'y':sum}), this is just an example to illustrate my problem
I get this output
y
z
a NaN
b NaN
as well as a warning message
__main__:1: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.
How can I avoid this warning message and get the correct output? In my example the only valid indices for df
are [0,3,5]
so the expected output is:
y
z
a 7 #"sum" of index 0
b 22 #sum of index [3,5]
EDIT
The answers here work great but they do not allow different types of aggregation of x
and y
columns. For example, let's say I want to sum all elements of x
, but for y
only sum the elements in index i
:
df.groupby('z').aggregate({'x':sum, 'y': lambda x: sum(x.loc[i])})
this is the desired output:
y x
z
a 7 6
b 22 15