2

In numpy the function for calculating the standard deviaiton expects a list of values like [1, 2, 1, 1] and calculates the standard deviation from those. In my case I have a nested list of values and counts like [[1, 2], [3, 1]], where the first list contains the values and the second contains the count of how often the corresponding values appear.

I am looking for a clean way of calculating the standard deviation for a given list like above, clean meaning

  • an already existing function in numpy, scipy, pandas etc.
  • a more pythonic approach to the problem
  • a more concise and nicely readable solution

I already have a working solution, that converts the nested count value list into a flattened list of values and calculates the standard deviation with the function above, but i find it not that pleasing and would rather have another option.

A minimal working example of my workaround is

import numpy as np

# The usual way
values = [1,2,1,1]
deviation = np.std(values)
print(deviation)

# My workaround for the problem
value_counts = [[1, 2], [3, 1]]
values, counts = value_counts
flattened = []
for value, count in zip(values, counts):
    # append the current value count times
    flattened = flattened + [value]*count
deviation = np.std(flattened)
print(deviation)

The output is

0.4330127018922193
0.4330127018922193

Thanks for any ideas or suggestions :)

marsheep
  • 23
  • 5
  • "I am looking for a clean way of calculating the standard deviation" your working solution looks clean enough. Sure, you can make it more concise by using list comprehensions, lambdas and whatnot, but unless you can define what you mean by "clean way", your solution is as good as any for your question. – jfaccioni May 04 '19 at 11:08
  • Thanks for the feedback! I added a hopefully more detailed explanation of what i mean by "clean" – marsheep May 04 '19 at 11:25
  • Possible duplicate of [Weighted standard deviation in NumPy](https://stackoverflow.com/questions/2413522/weighted-standard-deviation-in-numpy) – Zaccharie Ramzi May 04 '19 at 14:58

1 Answers1

1

You are simply looking for numpy.repeat.

numpy.std(numpy.repeat(value_counts[0], value_counts[1]))
Patol75
  • 4,342
  • 1
  • 17
  • 28
  • 1
    this does job but might be memory inefficient if you have high counts. To leverage the sparsity of your data, you may want to use the answer provided [here](https://stackoverflow.com/questions/2413522/weighted-standard-deviation-in-numpy) – Zaccharie Ramzi May 05 '19 at 11:33