0

Here's a brief example of a function. It maps a vector to a vector. However, entries that are NaN or inf should be ignored. Currently this looks rather clumsy to me. Do you have any suggestions?

from scipy import stats
import numpy as np

def p(vv):
    mask = np.isfinite(vv)
    y = np.NaN * vv
    v = vv[mask]

    y[mask] = 1/v*(stats.hmean(v)/len(v))
    return y
tschm
  • 2,905
  • 6
  • 33
  • 45
  • What problems do you have with the NaNs? Is your question "How to handle NaNs"? Your title isn't very clear. – kkuilla Feb 17 '14 at 08:56
  • I just want to learn more about them. I thought there is maybe a more elegant construction to operate only on the finite entries. – tschm Feb 17 '14 at 08:59
  • It might be off-topic as your question should be about a specific problem. For example, off-topic questions include "there is no actual problem to be solved: "I’m curious if other people feel like I do." see http://stackoverflow.com/help/dont-ask)" – kkuilla Feb 17 '14 at 09:13

3 Answers3

1

You can change the NaN values to zero with Numpy's isnan function and then remove the zeros as follows:

import numpy as np

def p(vv):
    # assuming vv is your array
    # use Nympy's isnan function to replace the NaN values in the array with zero

     replace_NaN = np.isnan(vv)
     vv[replace_NaN] = 0

     # convert array vv to list
     vv_list = vv.tolist()
     new_list = []

     # loop vv_list and exclude 0 values:
      for i in vv_list:
          if i != 0:
              new.list.append(i)

      # set array vv again

      vv = np.array(new_list, dtype = 'float64')

      return vv
user1749431
  • 559
  • 6
  • 21
1

I have came up with this kind of construction:

from scipy import stats
import numpy as np


## operate only on the valid entries of x and use the same mask on the resulting vector y
def __f(func, x):
    mask = np.isfinite(x)
    y = np.NaN * x
    y[mask] = func(x[mask])
    return y


# implementation of the parity function
def __pp(x):
    return 1/x*(stats.hmean(x)/len(x))


def pp(vv):
    return __f(__pp, vv)
tschm
  • 2,905
  • 6
  • 33
  • 45
1

Masked arrays accomplish this functionality and allow you to specify the mask as you desire. The numpy 1.18 docs for it are here: https://numpy.org/doc/1.18/reference/maskedarray.generic.html#what-is-a-masked-array

In masked arrays, False mask values are used in calculations, while True are ignored for calculations.

Example for obtaining the mean of only the finite values using np.isfinite():

import numpy as np

# Seeding for reproducing these results
np.random.seed(0)

# Generate random data and add some non-finite values
x = np.random.randint(0, 5, (3, 3)).astype(np.float32)
x[1,2], x[2,1], x[2,2] = np.inf, -np.inf, np.nan
# array([[  4.,   0.,   3.],
#        [  3.,   3.,  inf],
#        [  3., -inf,  nan]], dtype=float32)

# Make masked array. Note the logical not of isfinite
x_masked = np.ma.masked_array(x, mask=~np.isfinite(x))

# Mean of entire masked matrix
x_masked.mean()
# 2.6666666666666665

# Masked matrix's row means
x_masked.mean(1)
# masked_array(data=[2.3333333333333335, 3.0, 3.0],
#              mask=[False, False, False],
#        fill_value=1e+20)

# Masked matrix's column means
x_masked.mean(0)
# masked_array(data=[3.3333333333333335, 1.5, 3.0],
#              mask=[False, False, False],
#        fill_value=1e+20)

Note that scipy.stats.hmean() also works with masked arrays.

Note that if all you care about is detecting NaNs and leaving infs, then you can use np.isnan() instead of np.isfinite().

prijatelj
  • 865
  • 2
  • 12
  • 27