I have a large DataFrame along the lines of the one below, and I am looking to calculate either mean or mode on particular columns. I am looking to groupby a feature (whether it is on or off) and also an ID column (1 or 2).
MY_DF
Feature ID ITEM1 ITEM2 ITEM3
index
1 ON 1 2.3 544 32.2
2 OFF 1 2.55 544 23.4
3 ON 2 8.32 674 15.5
4 ON 1 1.56 544 43.4
5 OFF 2 3.55 674 32.4
I am able to do both tasks separately to find mean and mode for particular columns, but I can't work out how to join the two outputs together, but then I was also wondering if there is a way to calculate mean or mode of selected columns in one hit?
In my current method, although I am able to do both tasks separetely I am struggling to join the two outputs together.
Here's my separate methods
#as mode is not an official function in Pandas, set out method here
from scipy.stats.mstats import mode
f = lambda x: mode(x, axis=None)[0]
means = MY_DF.groupby(['Feature','ID'])[ITEM1,ITEM3].mean()
moder = MY_DF.groupby(['Feature','ID'])[ITEM2].apply(f)
My mean method outputs something like this:
ITEM1 ITEM3
ON 1 1.93 37.8
2 8.32 15.5
OFF 1 2.55 23.4
2 3.55 32.4
My mode method outputs something like this:
ITEM2
ON 1 [544]
2 [674]
OFF 1 [544]
2 [674]
Im looking for a final output something like this if possible?
ITEM1 ITEM3 ITEM2
ON 1 1.93 37.8 544
2 8.32 15.5 674
OFF 1 2.55 23.4 544
2 3.55 32.4 674