How to find standard deviation on filtered data (groupby)

Question

I have created a dataframe from an Excel sheet, then filtered it to values in the [Date_rank] column less than 10. The resulting dataframe is filtered

I've then used: g = groupby("Well_name") to segregate the data by each well

Now that I have the data grouped by Well_name, how can I find the standard deviation of [RandomNumber] in this group (providing me with the stdev for both of the wells RandomNumbers)? Perhaps it was not necessary to use the groupby function?

df = pd.read_csv('here.csv')
print(df)

filtered = df[df['Date_rank']<10] #filter the datafram to less than 10
print(filtered)

g = filtered.groupby('Well_name') #grouped the data to segregate by well name

Here is my data

     Well_name  Date_rank  RandomNumber
0      Velta          1             4
1      Velta          2             5
2      Velta          3             2
3      Velta          4             4
4      Velta          5             4
5      Velta          6             9
6      Velta          7             0
7      Velta          8             9
8      Velta          9             1
9      Velta         10             3
10     Velta         11             8
11     Velta         12             3
12     Velta         13            10
13     Velta         14            10
14     Velta         15             0
15    Ronnie          1             8
16    Ronnie          2             1
17    Ronnie          3             6
18    Ronnie          4             2
19    Ronnie          5             2
20    Ronnie          6             9
21    Ronnie          7             6
22    Ronnie          8             5
23    Ronnie          9             2
24    Ronnie         10             1
25    Ronnie         11             3
26    Ronnie         12             3
27    Ronnie         13             4
28    Ronnie         14             0
29    Ronnie         15             4

This stack overflow question/answer may help with the standard deviation calculation. https://stackoverflow.com/questions/15389768/standard-deviation-of-a-list If the full list of well_names is what you intend to use for the calculation, then you probably don't "need" the group by. — Daileyo, Nov 21 '19 at 02:33

score 0 · Answer 1 · answered Nov 21 '19 at 01:49

0

You should be able to solve the problem with groupby() as you stated. The code you should use is the following:

g = filtered.groupby('Well_name')['RandomNumber'].std()

Or using .agg()

g = filtered.groupby('Well_name').agg({'RandomNumber':'np.std'})

answered Nov 21 '19 at 01:49

Celius Stingher

17,835
6
23
53

How to find standard deviation on filtered data (groupby)

1 Answers1