0

I have this data frame called sum_2.

I want to be able to take the mean of (total_tom + total_Tomato_CalR + total_CalR) while ignoring NaN values and grouping by Time, Type_of_mouse, and Mouse_number all at the same time since I don't want different Times, Type_of_mouse, and Mouse_number values adding with each other.

The output that I ideally want is (126.0 + 91.0 + 53.0)/3 = 90 (90 being the mean from adding the values altogether).

I believe that pandas will automatically ignore NaN values when using .sum() and .mean() (correct me on this if I am wrong).

In [1]: sum__2
Out[1]:
  
Time    Type_of_mouse   Mouse_number    total_tom   total_Tomato_CalR   total_CalR
15w pTam    FF  6045    126.0       
15w pTam    FF  6551        91.0    
15w pTam    FF  9729    53.0        
15w pTam    WT  6046        109.0   
15w pTam    WT  6156        106.0   
15w pTam    WT  6835        129.0   
15w pTam    WT  9631    108.0       
15w pTam    WT  A859    238.0       
2w pTam FF  6606    70.0        
2w pTam FF  6609    78.0        
2w pTam FF  6649            178.0
2w pTam FF  9205            87.0
2w pTam FF  9211            133.0
2w pTam WT  6050    118.0       
2w pTam WT  A1257   138.0       
8m pTam FF  8998    165.0       
8m pTam FF  9004    106.0       
8m pTam FF  9007    91.0        
8m pTam FF  9048    100.0       
8m pTam WT  9005    184.0       
8m pTam WT  9043    173.0       
8m pTam WT  9045    163.0       
8m pTam WT  9224    187.0       

Alex
  • 25
  • 3
  • Please don't post images. Take a look at [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Shubham Sharma Jun 23 '20 at 09:53

3 Answers3

1

IIUC, do you need:

grp = sum_df.groupby(['Time', 'Type_of_mouse'])
grp.sum().sum(axis=1) / grp.count().sum(axis=1)

Output:

Time      Type_of_mouse
15w pTam  FF                90.00
          WT               138.00
2w pTam   FF               109.20
          WT               128.00
8m pTam   FF               115.50
          WT               176.75
dtype: float64
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
0

you can use agg() function after groupby. In pass column name and the function you want to preform in key value pair ex:

df1=df.groupby([some columns]).agg({'coulmn_name_to_agg':'sum/mean/count','another_column':'sum/mean/count'})

link to agg doc.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html

Amit Kumar
  • 613
  • 3
  • 15
0

You can use .groupby and .transform and add the columns together that you want to add.

sum_2['new_col'] = sum_2.groupby(['Time','Type_of_mouse', 'Mouse_number'])['total_tom'].transform(mean) + sum_2.groupby(['Time','Type_of_mouse', 'Mouse_number'])['total_Tomato_CalR'].transform(mean) + sum_2.groupby(['Time','Type_of_mouse', 'Mouse_number'])['total_CalR'].transform(mean)

this might also be a more concise version:

gb = sum_2.groupby(['Time','Type_of_mouse', 'Mouse_number'])
sum_2['new_col'] = gb.total_tom.transform(mean) + gb.total_Tomato_CalR.transform(mean) + gb.total_CalR.transform(mean)
David Erickson
  • 16,433
  • 2
  • 19
  • 35
  • Perhaps my wording was confusing or just outright wrong, but I have edited my question to include (126.0 + 91.0 + 53.0)/3, which are the values of 'total_tom', 'total_Tomato_CalR', and 'total_CalR' in the parentheses respectively. I'm only adding this comment because I take it that you understood my question as saying that I want the mean of each of those columns. – Alex Jun 23 '20 at 10:16
  • can you provide the example data above in a copy and pasteable format? See here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – David Erickson Jun 23 '20 at 16:35
  • Just posted. Let me know if I should provide more examples or information. – Alex Jun 23 '20 at 19:12