I am trying to aggregate an entire dataframe using pandas, without grouping by anything.
I do need different functions for different columns so I'm using a dictionary, however passing 'first' or 'last' as aggregation functions throws a ValueError: no results, while others such as 'min'/'max'/'mean' give no problem.
This is a simplification of the code.
df = pd.DataFrame({'Col1':[1,2,3,4], 'Col2':[5,6,7,8], 'Col3':[9,10,11,12]})
func = {col: ['first', 'last'] if col in ['Col1']
else ['first', 'last', 'mean'] if col in ['Col2']
else 'mean' for col in df.columns}
result = df.agg(func)
Using
result = df.groupby(lambda _ : True).agg(func)
does the job but is quite slow, I assume due to the groupby. The dataframe is already a subset of a larger dataframe that cannot be further grouped.
I have hundreds of columns, I cannot aggregate them individually.
Is there another way to obtain the first and last row, as well as different aggregations, in a faster/more efficient way than grouping?
For a sample dataframe like this
Col1 Col2 Col3
0 1 5 9
1 2 6 10
2 3 7 11
3 4 8 12
The output should be
Col1 Col2 Col3
first last first last mean mean
True 1 4 5 8 6.5 10.5
Edit: As the original groupby functions would do, no null values/columns should be removed.