1

I want to take subsets of elements and quickly apply nanmean to the associated columns, without looping.

For specificity, consider the reduction array r=[0,2,3], and the data array

a=np.array([
             [2,3,4],
             [3,np.NaN,5],
             [16,66,666],
             [2,2,5],
             [np.NaN,3,4],
             [np.NaN,4,5],
             [np.NaN,5,4],
             [3,6,4.5],
           ])

then I want to get back

b = np.array([
             [2.5,3,4.5],
             [16,66,666],
             [2.5,4,4.5],
           ])

The top answer to this question solves the problem (for a single column) by using reduceat. Unfortunately for me, since nanmean is not a ufunc that trick does not work.

Community
  • 1
  • 1
Brian B
  • 1,410
  • 1
  • 16
  • 30

1 Answers1

3

I don't think there's a one-liner to do this, because there are no nan-aware ufuncs in numpy.

But you can do something based on reduceat, after (temporarily) replacing all the nans in a:

For example, here's a quick function that accomplishes what you want:

def nanmean_reduceat(x, indices):
    mask = np.isnan(x)
    # use try-finally to make sure x is reset
    # to its original state even if an error is raised.
    try:
        x[mask] = 0
        return np.add.reduceat(x, indices) / np.add.reduceat(~mask, indices)
    finally:
        x[mask] = np.nan

then you can call

>>> nanmean_reduceat(a, [0, 2, 3])
array([[   2.5,    3. ,    4.5],
       [  16. ,   66. ,  666. ],
       [   2.5,    4. ,    4.5]])

Hope that helps!

Edit: for brevity, I removed the empty except block and moved the return statement inside the try block. Because of the way finally statements work, the resetting of x is still executed!

jakevdp
  • 77,104
  • 11
  • 125
  • 160