0

I have a data set containing the following:

Table Example

And I need to calculate the mean of the duration column only for Jan and Conditions Yes. I tried this but it is not giving the correct value

Jan_Mean = np.where((df['Date']=="Jan")  & (df['Condition']=="Yes"), df["Duration"],0).mean()
ManojK
  • 1,570
  • 2
  • 9
  • 17
  • I think you want `Jan_Mean = np.where((df['Date']=="Jan") & (df['Condition']=="Yes"), df["Duration"].mean() ,0)` – yatu Mar 06 '20 at 14:18
  • Welcome to SO!, although this is a small example, for future reference I recommend checking [this post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) on posting `pandas` related questions. – vlizana Mar 06 '20 at 14:26

3 Answers3

1
df.groupby(['Date','Condition']).mean().loc['Jan','Yes'][0]

Out[1]:
1.5

Explanation

This gives you the desired data in DataFrame format:

df.groupby(['Date','Condition']).mean()
Out[2]:

                Duration
Date    Condition   
Feb     Yes     3.0
Jan     Yes     1.5
Alex
  • 1,118
  • 7
  • 7
0

Pandas uses NumPy under the hood, something like

df[(df['Date']=="Jan")  & (df['Condition']=="Yes")]["Duration"].mean()

should do the trick. Here

(df['Date']=="Jan")  & (df['Condition']=="Yes")

is a boolean mask, so applied to the dataframe you get a filtered version of it, then accessing to the column you can use its methods. More available methods here.

vlizana
  • 2,962
  • 1
  • 16
  • 26
0

How about using a groupby and mean.

df.groupby(['Date', 'Condition']).mean().loc[('Jan', 'Yes')]
matthewmturner
  • 566
  • 7
  • 21