1

I have a list of numpy arrays of size 5. All of the arrays inside the list are of different lengths. I need a single array that holds the means of the elements and a single array that holds the standard deviations. Example:

[10, 10, 10, 10]
[ 8,  8,  8,  8, 8]
[12, 12, 12]

I want:

[10, 10, 10, 9, 8] and
[1.3, 1.3, 1.3, 1.1, 0]

(I made up the std devs)

Thanks in advance!

Mike Dunn
  • 80
  • 7

1 Answers1

1

One way would be to fill the empty places with NaNs, resulting in a 2D array and then use nan specific NumPy arithmetic tools, such as nanmean (compute mean skipping the NaNs) etc. along the appropriate axis, like so -

In [5]: import itertools

# a is input list of lists/arrays
In [48]: ar = np.array(list(itertools.zip_longest(*a, fillvalue=np.nan)))

In [49]: np.nanmean(ar,axis=1)
Out[49]: array([10., 10., 10.,  9.,  8.])

In [50]: np.nanstd(ar,axis=1)
Out[50]: array([1.63299316, 1.63299316, 1.63299316, 1.        , 0.        ])

Another way is to convert to a pandas dataframe such that empty places are filled with NaNs and then use dataframe methods that account for the NaNs natively, like so -

In [16]: import pandas as pd

In [17]: df = pd.DataFrame(a)

In [18]: df.mean(0).values
Out[18]: array([10., 10., 10.,  9.,  8.])

In [19]: df.std(0,ddof=0).values
Out[19]: array([1.63299316, 1.63299316, 1.63299316, 1.        , 0.        ])
Divakar
  • 218,885
  • 19
  • 262
  • 358