1

I have an asymmetric 2d array in numpy, as in some arrays are longer than others, such as: [[1, 2], [1, 2, 3], ...]

But numpy doesn't seem to like this:

import numpy as np

foo = np.array([[1], [1, 2]])
foo.mean(axis=1)

Traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tom/.virtualenvs/nlp/lib/python3.5/site-packages/numpy/core/_methods.py", line 56, in _mean
    rcount = _count_reduce_items(arr, axis)
  File "/home/tom/.virtualenvs/nlp/lib/python3.5/site-packages/numpy/core/_methods.py", line 50, in _count_reduce_items
    items *= arr.shape[ax]
IndexError: tuple index out of range

Is there a nice way to do this or should I just do the maths myself?

Mazdak
  • 105,000
  • 18
  • 159
  • 188
Tom Carrick
  • 6,349
  • 13
  • 54
  • 78
  • It's obvious that you don't have any item in first row's second axis. If you want to calculate the mean of whole items you can flatten the array them calculate the mean. – Mazdak Sep 20 '16 at 12:13
  • possible duplicate? http://stackoverflow.com/questions/10058227/python-calculating-mean-of-arrays-with-different-lengths – AlvaroP Sep 20 '16 at 12:16
  • What do you mean by `nice way` here? Efficient or short/compact code? – Divakar Sep 20 '16 at 12:21
  • I think I'm confused, I thought the axes ran horizontally, not vertically. Indeed, that seems to be the case when I run it on a symmetric data set, but maybe I'm misreading it. By nice way, I mean sticking within numpy rather than writing something myself or using `statistics` from stdlib. – Tom Carrick Sep 20 '16 at 12:24

2 Answers2

2

You could perform the mean for each sub-array of foo using a list comprehension:

mean_foo = np.array( [np.mean(subfoo) for subfoo in foo] )

As suggested by @Kasramvd in another answer's comment, you can also use the map function :

mean_foo = np.array( map(np.mean, foo) )
Bertrand Gazanion
  • 705
  • 1
  • 14
  • 19
2

We could use an almost vectorized approach based upon np.add.reduceat that takes care of the irregular length subarrays, for which we are calculating the average values. np.add.reduceat sums up elements in those intervals of irregular lengths after getting a 1D flattened version of the input array with np.concatenate. Finally, we need to divide the summations by the lengths of those subarrays to get the average values.

Thus, the implementation would look something like this -

lens = np.array(map(len,foo)) # Thanks to @Kasramvd on this!
vals = np.concatenate(foo)
shift_idx = np.append(0,lens[:-1].cumsum())
out = np.add.reduceat(vals,shift_idx)/lens.astype(float)
Divakar
  • 218,885
  • 19
  • 262
  • 358