Calculate mean with Numpy base on other column values

Question

I have a data set containing the following:

And I need to calculate the mean of the duration column only for Jan and Conditions Yes. I tried this but it is not giving the correct value

Jan_Mean = np.where((df['Date']=="Jan")  & (df['Condition']=="Yes"), df["Duration"],0).mean()

I think you want `Jan_Mean = np.where((df['Date']=="Jan") & (df['Condition']=="Yes"), df["Duration"].mean() ,0)` — yatu, Mar 06 '20 at 14:18
Welcome to SO!, although this is a small example, for future reference I recommend checking [this post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) on posting `pandas` related questions. — vlizana, Mar 06 '20 at 14:26

Alex · Answer 1 · 2020-03-06T14:30:10.323

1

df.groupby(['Date','Condition']).mean().loc['Jan','Yes'][0]

Out[1]:
1.5

Explanation

This gives you the desired data in DataFrame format:

df.groupby(['Date','Condition']).mean()
Out[2]:

                Duration
Date    Condition   
Feb     Yes     3.0
Jan     Yes     1.5

edited Mar 06 '20 at 14:30

answered Mar 06 '20 at 14:25

Alex

1,118
7
7

Not exactly what I needed but it can definitely help in the future. Thanks!! – jose Gajardo Mar 06 '20 at 14:36

score 0 · Accepted Answer · answered Mar 06 '20 at 14:23

Pandas uses NumPy under the hood, something like

df[(df['Date']=="Jan")  & (df['Condition']=="Yes")]["Duration"].mean()

should do the trick. Here

(df['Date']=="Jan")  & (df['Condition']=="Yes")

is a boolean mask, so applied to the dataframe you get a filtered version of it, then accessing to the column you can use its methods. More available methods here.

score 0 · Answer 3 · answered Mar 06 '20 at 14:23

0

How about using a groupby and mean.

df.groupby(['Date', 'Condition']).mean().loc[('Jan', 'Yes')]

answered Mar 06 '20 at 14:23

matthewmturner

566
7
21

Calculate mean with Numpy base on other column values

3 Answers3

Explanation