Is there a way in pandas to search in one dataframe to determine what happen in another dataframe?

Question

I'm very new to coding in general and i'm trying to make a program that can do my some data processing for me. i have two data frames, one that contains the average of four samples and a data frame that contains the relative standard deviation for those four samples. I want to make the make the average zero if the relative standard deviation value is above a certain number, how would i do that? I was thinking an if statement but i don't know where to start when constructing it.

Can you send us an example of your data, your expected result and the code you tried to use? — Maku, Mar 05 '20 at 05:34
Since you're new I highly recommend taking the time to [read this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and keeping it handy. If you want help from the community, posting a good and reproducible question is very important. — Ukrainian-serge, Mar 05 '20 at 05:39

score 0 · Answer 1 · answered Mar 05 '20 at 06:57

You could try something like this -

df = pd.DataFrame({'sample':['a','a','a','b','b','b','c','c','c','d','d','d'], 
                   'value':[10,20,10,14,24,5,12,13,14,12,4,5]})
means = df.groupby(['sample']).agg(mean=('value','mean'))
stds = df.groupby(['sample']).agg(std=('value','std'))
>>> means
         mean
sample
a       13.333333
b       14.333333
c       13.000000
d        7.000000    

>>> stds
        std
sample
a       5.773503
b       9.504385
c       1.000000
d       4.358899

means and stds are two dataframes which you mentioned in the question - means holds the mean and stds holds the standard deviation.

Now you could try the following ( variable threshold holds the cut-off value for standard deviation ) -

import numpy as np 
threshold = 4
stats = pd.concat([means, stds], axis=1)
>>> stats
         mean       std
sample
a       13.333333  5.773503
b       14.333333  9.504385
c       13.000000  1.000000
d        7.000000  4.358899

stats['mean'] = np.where(stats['std']>threshold, 0, stats['mean'])
>> stats
        mean       std
sample
a        0.0  5.773503
b        0.0  9.504385
c       13.0  1.000000
d        0.0  4.358899

Is there a way in pandas to search in one dataframe to determine what happen in another dataframe?

1 Answers1