Standard deviation of a list

Question

I want to find mean and standard deviation of 1st, 2nd,... digits of several (Z) lists. For example, I have

A_rank=[0.8,0.4,1.2,3.7,2.6,5.8]
B_rank=[0.1,2.8,3.7,2.6,5,3.4]
C_Rank=[1.2,3.4,0.5,0.1,2.5,6.1]
# etc (up to Z_rank )...

Now I want to take the mean and std of *_Rank[0], the mean and std of *_Rank[1], etc.
(ie: mean and std of the 1st digit from all the (A..Z)_rank lists;
the mean and std of the 2nd digit from all the (A..Z)_rank lists;
the mean and std of the 3rd digit...; etc).

score 199 · Answer 1 · edited Jun 29 '17 at 17:59

199

Since Python 3.4 / PEP450 there is a statistics module in the standard library, which has a method stdev for calculating the standard deviation of iterables like yours:

>>> A_rank = [0.8, 0.4, 1.2, 3.7, 2.6, 5.8]
>>> import statistics
>>> statistics.stdev(A_rank)
2.0634114147853952

edited Jun 29 '17 at 17:59

Zach Young

10,137
4
32
53

answered Feb 02 '14 at 00:27

Bengt

14,011
7
48
66

48

It's worth pointing out that `pstddev` should probably be used instead if your list represents the entire population (i.e. the list is not a sample of a population). `stddev` is calculated using sample variance and will overestimate the population mean. – Alex Riley Jan 03 '15 at 17:55
8

The functions are actually called [`stdev`](https://docs.python.org/3.4/library/statistics.html#statistics.stdev) and [`pstdev`](https://docs.python.org/3.4/library/statistics.html#statistics.pstdev), not using `std` for `standard` as one would expect. I couldn't edit the post as edits need to modify at least 6 chars... – mknaf Jan 06 '17 at 16:00

NPE · Answer 2 · 2013-03-13T15:50:43.370

120

I would put A_Rank et al into a 2D NumPy array, and then use numpy.mean() and numpy.std() to compute the means and the standard deviations:

In [17]: import numpy

In [18]: arr = numpy.array([A_rank, B_rank, C_rank])

In [20]: numpy.mean(arr, axis=0)
Out[20]: 
array([ 0.7       ,  2.2       ,  1.8       ,  2.13333333,  3.36666667,
        5.1       ])

In [21]: numpy.std(arr, axis=0)
Out[21]: 
array([ 0.45460606,  1.29614814,  1.37355985,  1.50628314,  1.15566239,
        1.2083046 ])

edited Mar 13 '13 at 15:50

answered Mar 13 '13 at 15:42

NPE

486,780
108
951
1,012

2

the result of numpy.std is not correct. Given these values: 20,31,50,69,80 and put in Excel using STDEV.S(A1:A5) the result is 25,109 NOT 22,45. – Jim Clermonts Oct 01 '15 at 09:28
24

@JimClermonts It has nothing to do with correctness. Whether or not ddof=0 (default, interprete data as population) or ddof=1 (interprete it as samples, i.e. estimate true variance) depends on what you're doing. – runDOSrun Jan 15 '16 at 10:32
20

To further clarify @runDOSrun's point, the Excel function `STDEV.P()` and the Numpy function `std(ddof=0)` calculate the *population* sd, or *uncorrected sample* sd, whilst the Excel function `STDEV.S()` and Numpy function `std(ddof=1)` calculate the *(corrected) sample* sd, which equals sqrt(N/(N-1)) times the population sd, where N is the number of points. See more: https://en.m.wikipedia.org/wiki/Standard_deviation#Corrected_sample_standard_deviation – binaryfunt Apr 09 '16 at 15:56

Alex Riley · Answer 3 · 2017-10-08T10:30:41.827

55

Here's some pure-Python code you can use to calculate the mean and standard deviation.

All code below is based on the statistics module in Python 3.4+.

def mean(data):
    """Return the sample arithmetic mean of data."""
    n = len(data)
    if n < 1:
        raise ValueError('mean requires at least one data point')
    return sum(data)/n # in Python 2 use sum(data)/float(n)

def _ss(data):
    """Return sum of square deviations of sequence data."""
    c = mean(data)
    ss = sum((x-c)**2 for x in data)
    return ss

def stddev(data, ddof=0):
    """Calculates the population standard deviation
    by default; specify ddof=1 to compute the sample
    standard deviation."""
    n = len(data)
    if n < 2:
        raise ValueError('variance requires at least two data points')
    ss = _ss(data)
    pvar = ss/(n-ddof)
    return pvar**0.5

Note: for improved accuracy when summing floats, the statistics module uses a custom function _sum rather than the built-in sum which I've used in its place.

Now we have for example:

>>> mean([1, 2, 3])
2.0
>>> stddev([1, 2, 3]) # population standard deviation
0.816496580927726
>>> stddev([1, 2, 3], ddof=1) # sample standard deviation
0.1

edited Oct 08 '17 at 10:30

answered Jan 03 '15 at 18:48

Alex Riley

169,130
45
262
238

1

Should it not be `pvar=ss/(n-1)` ? – Ranjith Ramachandra Jun 08 '15 at 13:28
2

@Ranjith: if you want to calculate the *sample* variance (or sample SD) you can use `n-1`. The code above is for the population SD (so there are `n` degrees of freedom). – Alex Riley Jun 08 '15 at 13:38
Hello Alex, Could you please post function for calculating sample standard deviation? I am limited with Python2.6, so I have to relay on this function. – Venu S Oct 08 '17 at 00:56
@VenuS: Hello, I've edited the `stddev` function so that it can calculate both sample and population standard deviations. – Alex Riley Oct 08 '17 at 10:29
isnt the sample standard deviation of that list 1.0? – KansaiRobot Jan 24 '22 at 02:35

score 22 · Answer 4 · edited Sep 15 '15 at 12:49

In Python 2.7.1, you may calculate standard deviation using numpy.std() for:

Population std: Just use numpy.std() with no additional arguments besides to your data list.
Sample std: You need to pass ddof (i.e. Delta Degrees of Freedom) set to 1, as in the following example:

numpy.std(< your-list >, ddof=1)

The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.

It calculates sample std rather than population std.

score 15 · Answer 5 · answered Apr 14 '19 at 06:49

Using python, here are few methods:

import statistics as st

n = int(input())
data = list(map(int, input().split()))

Approach1 - using a function

stdev = st.pstdev(data)

Approach2: calculate variance and take square root of it

variance = st.pvariance(data)
devia = math.sqrt(variance)

Approach3: using basic math

mean = sum(data)/n
variance = sum([((x - mean) ** 2) for x in X]) / n
stddev = variance ** 0.5

print("{0:0.1f}".format(stddev))

Note:

variance calculates variance of sample population
pvariance calculates variance of entire population
similar differences between stdev and pstdev

score 13 · Answer 6 · edited Apr 13 '17 at 12:19

13

In python 2.7 you can use NumPy's numpy.std() gives the population standard deviation.

In Python 3.4 statistics.stdev() returns the sample standard deviation. The pstdv() function is the same as numpy.std().

edited Apr 13 '17 at 12:19

Community

1
1

answered Apr 24 '14 at 16:15

B.Kocis

1,954
20
19

The `statistics` module accepts `int`, `float`, `Decimal` and `Fraction` -- [`Decimal`](https://docs.python.org/3/library/decimal.html) in particular could be useful if you need exact or want to alter the precision. – Sean Breckenridge Mar 25 '22 at 03:22

Elad Yehezkel · Answer 7 · 2017-06-08T15:24:56.060

5

pure python code:

from math import sqrt

def stddev(lst):
    mean = float(sum(lst)) / len(lst)
    return sqrt(float(reduce(lambda x, y: x + y, map(lambda x: (x - mean) ** 2, lst))) / len(lst))

edited Jun 08 '17 at 15:24

answered Jun 08 '17 at 14:45

Elad Yehezkel

111
1
5

13

There's nothing 'pure' about that 1-liner. Yuck. Here's more pythonic version: `sqrt(sum((x - mean)**2 for x in lst) / len(lst))` – DBrowne Oct 09 '17 at 04:36

score 3 · Answer 8 · answered May 22 '17 at 16:11

The other answers cover how to do std dev in python sufficiently, but no one explains how to do the bizarre traversal you've described.

I'm going to assume A-Z is the entire population. If not see Ome's answer on how to inference from a sample.

So to get the standard deviation/mean of the first digit of every list you would need something like this:

#standard deviation
numpy.std([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

#mean
numpy.mean([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

To shorten the code and generalize this to any nth digit use the following function I generated for you:

def getAllNthRanks(n):
    return [A_rank[n], B_rank[n], C_rank[n], D_rank[n], E_rank[n], F_rank[n], G_rank[n], H_rank[n], I_rank[n], J_rank[n], K_rank[n], L_rank[n], M_rank[n], N_rank[n], O_rank[n], P_rank[n], Q_rank[n], R_rank[n], S_rank[n], T_rank[n], U_rank[n], V_rank[n], W_rank[n], X_rank[n], Y_rank[n], Z_rank[n]]

Now you can simply get the stdd and mean of all the nth places from A-Z like this:

#standard deviation
numpy.std(getAllNthRanks(n))

#mean
numpy.mean(getAllNthRanks(n))

For any one interested, I generated the function using this messy one-liner: `str([chr(x)+'_rank[n]' for x in range(65,65+26)]).replace("'", "")` — Samie Bencherif, May 22 '17 at 16:13

Standard deviation of a list

8 Answers8

Approach1 - using a function

Approach2: calculate variance and take square root of it

Approach3: using basic math

Note:

Linked