147

I want to find mean and standard deviation of 1st, 2nd,... digits of several (Z) lists. For example, I have

A_rank=[0.8,0.4,1.2,3.7,2.6,5.8]
B_rank=[0.1,2.8,3.7,2.6,5,3.4]
C_Rank=[1.2,3.4,0.5,0.1,2.5,6.1]
# etc (up to Z_rank )...

Now I want to take the mean and std of *_Rank[0], the mean and std of *_Rank[1], etc.
(ie: mean and std of the 1st digit from all the (A..Z)_rank lists;
the mean and std of the 2nd digit from all the (A..Z)_rank lists;
the mean and std of the 3rd digit...; etc).

SherylHohman
  • 16,580
  • 17
  • 88
  • 94
physics_for_all
  • 2,193
  • 4
  • 19
  • 20

8 Answers8

199

Since Python 3.4 / PEP450 there is a statistics module in the standard library, which has a method stdev for calculating the standard deviation of iterables like yours:

>>> A_rank = [0.8, 0.4, 1.2, 3.7, 2.6, 5.8]
>>> import statistics
>>> statistics.stdev(A_rank)
2.0634114147853952
Zach Young
  • 10,137
  • 4
  • 32
  • 53
Bengt
  • 14,011
  • 7
  • 48
  • 66
  • 48
    It's worth pointing out that `pstddev` should probably be used instead if your list represents the entire population (i.e. the list is not a sample of a population). `stddev` is calculated using sample variance and will overestimate the population mean. – Alex Riley Jan 03 '15 at 17:55
  • 8
    The functions are actually called [`stdev`](https://docs.python.org/3.4/library/statistics.html#statistics.stdev) and [`pstdev`](https://docs.python.org/3.4/library/statistics.html#statistics.pstdev), not using `std` for `standard` as one would expect. I couldn't edit the post as edits need to modify at least 6 chars... – mknaf Jan 06 '17 at 16:00
120

I would put A_Rank et al into a 2D NumPy array, and then use numpy.mean() and numpy.std() to compute the means and the standard deviations:

In [17]: import numpy

In [18]: arr = numpy.array([A_rank, B_rank, C_rank])

In [20]: numpy.mean(arr, axis=0)
Out[20]: 
array([ 0.7       ,  2.2       ,  1.8       ,  2.13333333,  3.36666667,
        5.1       ])

In [21]: numpy.std(arr, axis=0)
Out[21]: 
array([ 0.45460606,  1.29614814,  1.37355985,  1.50628314,  1.15566239,
        1.2083046 ])
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 2
    the result of numpy.std is not correct. Given these values: 20,31,50,69,80 and put in Excel using STDEV.S(A1:A5) the result is 25,109 NOT 22,45. – Jim Clermonts Oct 01 '15 at 09:28
  • 24
    @JimClermonts It has nothing to do with correctness. Whether or not ddof=0 (default, interprete data as population) or ddof=1 (interprete it as samples, i.e. estimate true variance) depends on what you're doing. – runDOSrun Jan 15 '16 at 10:32
  • 20
    To further clarify @runDOSrun's point, the Excel function `STDEV.P()` and the Numpy function `std(ddof=0)` calculate the *population* sd, or *uncorrected sample* sd, whilst the Excel function `STDEV.S()` and Numpy function `std(ddof=1)` calculate the *(corrected) sample* sd, which equals sqrt(N/(N-1)) times the population sd, where N is the number of points. See more: https://en.m.wikipedia.org/wiki/Standard_deviation#Corrected_sample_standard_deviation – binaryfunt Apr 09 '16 at 15:56
55

Here's some pure-Python code you can use to calculate the mean and standard deviation.

All code below is based on the statistics module in Python 3.4+.

def mean(data):
    """Return the sample arithmetic mean of data."""
    n = len(data)
    if n < 1:
        raise ValueError('mean requires at least one data point')
    return sum(data)/n # in Python 2 use sum(data)/float(n)

def _ss(data):
    """Return sum of square deviations of sequence data."""
    c = mean(data)
    ss = sum((x-c)**2 for x in data)
    return ss

def stddev(data, ddof=0):
    """Calculates the population standard deviation
    by default; specify ddof=1 to compute the sample
    standard deviation."""
    n = len(data)
    if n < 2:
        raise ValueError('variance requires at least two data points')
    ss = _ss(data)
    pvar = ss/(n-ddof)
    return pvar**0.5

Note: for improved accuracy when summing floats, the statistics module uses a custom function _sum rather than the built-in sum which I've used in its place.

Now we have for example:

>>> mean([1, 2, 3])
2.0
>>> stddev([1, 2, 3]) # population standard deviation
0.816496580927726
>>> stddev([1, 2, 3], ddof=1) # sample standard deviation
0.1
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
22

In Python 2.7.1, you may calculate standard deviation using numpy.std() for:

  • Population std: Just use numpy.std() with no additional arguments besides to your data list.
  • Sample std: You need to pass ddof (i.e. Delta Degrees of Freedom) set to 1, as in the following example:

numpy.std(< your-list >, ddof=1)

The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.

It calculates sample std rather than population std.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Ome
  • 331
  • 3
  • 5
15

Using python, here are few methods:

import statistics as st

n = int(input())
data = list(map(int, input().split()))

Approach1 - using a function

stdev = st.pstdev(data)

Approach2: calculate variance and take square root of it

variance = st.pvariance(data)
devia = math.sqrt(variance)

Approach3: using basic math

mean = sum(data)/n
variance = sum([((x - mean) ** 2) for x in X]) / n
stddev = variance ** 0.5

print("{0:0.1f}".format(stddev))

Note:

  • variance calculates variance of sample population
  • pvariance calculates variance of entire population
  • similar differences between stdev and pstdev
pankaj
  • 1,004
  • 12
  • 20
13

In python 2.7 you can use NumPy's numpy.std() gives the population standard deviation.

In Python 3.4 statistics.stdev() returns the sample standard deviation. The pstdv() function is the same as numpy.std().

Community
  • 1
  • 1
B.Kocis
  • 1,954
  • 20
  • 19
  • The `statistics` module accepts `int`, `float`, `Decimal` and `Fraction` -- [`Decimal`](https://docs.python.org/3/library/decimal.html) in particular could be useful if you need exact or want to alter the precision. – Sean Breckenridge Mar 25 '22 at 03:22
5

pure python code:

from math import sqrt

def stddev(lst):
    mean = float(sum(lst)) / len(lst)
    return sqrt(float(reduce(lambda x, y: x + y, map(lambda x: (x - mean) ** 2, lst))) / len(lst))
Elad Yehezkel
  • 111
  • 1
  • 5
  • 13
    There's nothing 'pure' about that 1-liner. Yuck. Here's more pythonic version: `sqrt(sum((x - mean)**2 for x in lst) / len(lst))` – DBrowne Oct 09 '17 at 04:36
3

The other answers cover how to do std dev in python sufficiently, but no one explains how to do the bizarre traversal you've described.

I'm going to assume A-Z is the entire population. If not see Ome's answer on how to inference from a sample.

So to get the standard deviation/mean of the first digit of every list you would need something like this:

#standard deviation
numpy.std([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

#mean
numpy.mean([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

To shorten the code and generalize this to any nth digit use the following function I generated for you:

def getAllNthRanks(n):
    return [A_rank[n], B_rank[n], C_rank[n], D_rank[n], E_rank[n], F_rank[n], G_rank[n], H_rank[n], I_rank[n], J_rank[n], K_rank[n], L_rank[n], M_rank[n], N_rank[n], O_rank[n], P_rank[n], Q_rank[n], R_rank[n], S_rank[n], T_rank[n], U_rank[n], V_rank[n], W_rank[n], X_rank[n], Y_rank[n], Z_rank[n]] 

Now you can simply get the stdd and mean of all the nth places from A-Z like this:

#standard deviation
numpy.std(getAllNthRanks(n))

#mean
numpy.mean(getAllNthRanks(n))
Samie Bencherif
  • 1,285
  • 12
  • 27
  • For any one interested, I generated the function using this messy one-liner: `str([chr(x)+'_rank[n]' for x in range(65,65+26)]).replace("'", "")` – Samie Bencherif May 22 '17 at 16:13