numpy mean/std dev of arrays of various lengths

Question

I have a list of numpy arrays of size 5. All of the arrays inside the list are of different lengths. I need a single array that holds the means of the elements and a single array that holds the standard deviations. Example:

[10, 10, 10, 10]
[ 8,  8,  8,  8, 8]
[12, 12, 12]

I want:

[10, 10, 10, 9, 8] and
[1.3, 1.3, 1.3, 1.1, 0]

(I made up the std devs)

Thanks in advance!

Divakar · Accepted Answer · 2020-02-23T20:44:54.863

One way would be to fill the empty places with NaNs, resulting in a 2D array and then use nan specific NumPy arithmetic tools, such as nanmean (compute mean skipping the NaNs) etc. along the appropriate axis, like so -

In [5]: import itertools

# a is input list of lists/arrays
In [48]: ar = np.array(list(itertools.zip_longest(*a, fillvalue=np.nan)))

In [49]: np.nanmean(ar,axis=1)
Out[49]: array([10., 10., 10.,  9.,  8.])

In [50]: np.nanstd(ar,axis=1)
Out[50]: array([1.63299316, 1.63299316, 1.63299316, 1.        , 0.        ])

Another way is to convert to a pandas dataframe such that empty places are filled with NaNs and then use dataframe methods that account for the NaNs natively, like so -

In [16]: import pandas as pd

In [17]: df = pd.DataFrame(a)

In [18]: df.mean(0).values
Out[18]: array([10., 10., 10.,  9.,  8.])

In [19]: df.std(0,ddof=0).values
Out[19]: array([1.63299316, 1.63299316, 1.63299316, 1.        , 0.        ])

Awesome - thanks! I didn't even think about NaN's not being included in mean/std dev. — Mike Dunn, Feb 23 '20 at 20:44

numpy mean/std dev of arrays of various lengths

1 Answers1

Linked