I'm profiling some numeric time measurements that cluster extremely closely. I would like to obtain mean, standard deviation, etc. Some inputs are large, so I thought I could avoid creating lists of millions of numbers and instead use Python collections.Counter objects as a compact representation.
Example: one of my small inputs yields a collection.Counter
like [(48, 4082), (49, 1146)]
which means 4,082 occurrences of the value 48 and 1,146 occurrences of the value 49. For this data set I manually calculate the mean to be something like 48.2192042846.
Of course if I had a simple list of 4,082 + 1,146 = 5,228 integers I would just feed it to numpy.mean().
My question: how can I calculate descriptive statistics from the values in a collections.Counter
object just as if I had a list of numbers? Do I have to create the full list or is there a shortcut?