I'm very new to coding in general and i'm trying to make a program that can do my some data processing for me. i have two data frames, one that contains the average of four samples and a data frame that contains the relative standard deviation for those four samples. I want to make the make the average zero if the relative standard deviation value is above a certain number, how would i do that? I was thinking an if statement but i don't know where to start when constructing it.
Asked
Active
Viewed 42 times
1
-
Can you send us an example of your data, your expected result and the code you tried to use? – Maku Mar 05 '20 at 05:34
-
Since you're new I highly recommend taking the time to [read this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and keeping it handy. If you want help from the community, posting a good and reproducible question is very important. – Ukrainian-serge Mar 05 '20 at 05:39
1 Answers
0
You could try something like this -
df = pd.DataFrame({'sample':['a','a','a','b','b','b','c','c','c','d','d','d'],
'value':[10,20,10,14,24,5,12,13,14,12,4,5]})
means = df.groupby(['sample']).agg(mean=('value','mean'))
stds = df.groupby(['sample']).agg(std=('value','std'))
>>> means
mean
sample
a 13.333333
b 14.333333
c 13.000000
d 7.000000
>>> stds
std
sample
a 5.773503
b 9.504385
c 1.000000
d 4.358899
means
and stds
are two dataframes which you mentioned in the question - means
holds the mean and stds
holds the standard deviation.
Now you could try the following ( variable threshold
holds the cut-off value for standard deviation ) -
import numpy as np
threshold = 4
stats = pd.concat([means, stds], axis=1)
>>> stats
mean std
sample
a 13.333333 5.773503
b 14.333333 9.504385
c 13.000000 1.000000
d 7.000000 4.358899
stats['mean'] = np.where(stats['std']>threshold, 0, stats['mean'])
>> stats
mean std
sample
a 0.0 5.773503
b 0.0 9.504385
c 13.0 1.000000
d 0.0 4.358899

Sajan
- 1,247
- 1
- 5
- 13