Means of asymmetric arrays in numpy

Question

I have an asymmetric 2d array in numpy, as in some arrays are longer than others, such as: [[1, 2], [1, 2, 3], ...]

But numpy doesn't seem to like this:

import numpy as np

foo = np.array([[1], [1, 2]])
foo.mean(axis=1)

Traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tom/.virtualenvs/nlp/lib/python3.5/site-packages/numpy/core/_methods.py", line 56, in _mean
    rcount = _count_reduce_items(arr, axis)
  File "/home/tom/.virtualenvs/nlp/lib/python3.5/site-packages/numpy/core/_methods.py", line 50, in _count_reduce_items
    items *= arr.shape[ax]
IndexError: tuple index out of range

Is there a nice way to do this or should I just do the maths myself?

It's obvious that you don't have any item in first row's second axis. If you want to calculate the mean of whole items you can flatten the array them calculate the mean. — Mazdak, Sep 20 '16 at 12:13
possible duplicate? http://stackoverflow.com/questions/10058227/python-calculating-mean-of-arrays-with-different-lengths — AlvaroP, Sep 20 '16 at 12:16
What do you mean by `nice way` here? Efficient or short/compact code? — Divakar, Sep 20 '16 at 12:21
I think I'm confused, I thought the axes ran horizontally, not vertically. Indeed, that seems to be the case when I run it on a symmetric data set, but maybe I'm misreading it. By nice way, I mean sticking within numpy rather than writing something myself or using `statistics` from stdlib. — Tom Carrick, Sep 20 '16 at 12:24

Bertrand Gazanion · Answer 1 · 2016-09-20T12:45:32.287

2

You could perform the mean for each sub-array of foo using a list comprehension:

mean_foo = np.array( [np.mean(subfoo) for subfoo in foo] )

As suggested by @Kasramvd in another answer's comment, you can also use the map function :

mean_foo = np.array( map(np.mean, foo) )

edited Sep 20 '16 at 12:45

answered Sep 20 '16 at 12:15

Bertrand Gazanion

705
1
14
19

Divakar · Accepted Answer · 2016-09-20T12:50:48.937

2

We could use an almost vectorized approach based upon np.add.reduceat that takes care of the irregular length subarrays, for which we are calculating the average values. np.add.reduceat sums up elements in those intervals of irregular lengths after getting a 1D flattened version of the input array with np.concatenate. Finally, we need to divide the summations by the lengths of those subarrays to get the average values.

Thus, the implementation would look something like this -

lens = np.array(map(len,foo)) # Thanks to @Kasramvd on this!
vals = np.concatenate(foo)
shift_idx = np.append(0,lens[:-1].cumsum())
out = np.add.reduceat(vals,shift_idx)/lens.astype(float)

edited Sep 20 '16 at 12:50

answered Sep 20 '16 at 12:27

Divakar

218,885
19
262
358

1

It's better to use `map` instead of list comprehension, when you are dealing with built-in functions. It performs slightly faster. – Mazdak Sep 20 '16 at 12:30
1

@Kasramvd Awesome! Thanks, added to post. – Divakar Sep 20 '16 at 12:33

Means of asymmetric arrays in numpy

2 Answers2

Linked