3

I am trying to evaluate mean and stand deviation of a list of two huge tensors, with dimensions (79000, 128, 8, 75), for a total of 6067200000 elements. The problem is that, while the calculation of the mean with np.mean is just slow, when I evaluate the standard deviation with np.std the server gets stuck, probably I am running out of memory. Do you know any "smarter" way to evaluate mean and standard deviation? Thanks.

Phys
  • 508
  • 3
  • 11
  • 22
  • 2
    There's a wonderful article linked in this [question](https://stackoverflow.com/questions/5543651/computing-standard-deviation-in-a-stream) that may help you. I am not sure if there is an implementation available that you can use. – GWW Sep 23 '17 at 23:19
  • following that article above maybe break it down? use `reduce` to get the sum of all elements, `len` to get the number of elements, etc... – gold_cy Sep 23 '17 at 23:23
  • I am using [this version](https://gist.github.com/alexalemi/2151722) of Welford's method for Python, didn't improve much, the server still gets stuck. I have also tested the `reduce` function for a simple sum of elements on a list with the same number of elements, still slow. – Phys Sep 24 '17 at 10:13
  • Solved: shortly, I am using a code that updates the mean and the standard deviation every time the tensor is filled with a new chunk of data, basically it is a running calculation. – Phys Sep 26 '17 at 13:26

0 Answers0