4

As shown in the answer to the question Convert python list with None values to numpy array with nan values, it is straightforward to initialize a masked numpy array from a list with None values if we enforce the dtype=float. Those float values get converted to nan and we can simply do:

ma.masked_invalid(np.array(a, dtype=float), copy=False)

This however will not work for int like:

ma.masked_invalid(np.array(a, dtype=int), copy=False)

since the intermediate np.array will not be created with None values (there is no int nan).

What is the most efficient way to initialize a masked array based on Python list of ints that also contains None values in such way that those None values become masked?

Community
  • 1
  • 1
Andrzej Pronobis
  • 33,828
  • 17
  • 76
  • 92
  • I don't think there's going to be a good way to do this without a temporary array of object dtype. Make sure the solution you pick doesn't wind up with an object dtype at the end of it. – user2357112 May 20 '15 at 02:42

3 Answers3

4

The most elegant solution I have found so far (and it is not elegant at all) is to initialize a masked array of type float and convert it to int afterwards:

ma.masked_invalid(np.array(a, dtype=float), copy=False).astype(int)

This generates a proper NP array where None values in the initial array a are masked. For instance, for:

a = [1, 2, 3, None, 4]
ma.masked_invalid(np.array(a, dtype=float), copy=False).astype(int)

we get:

masked_array(data = [1 2 3 -- 4],
             mask = [False False False  True False],
       fill_value = 999999)

Also, the actual masked int values become min int, i.e.

ma.masked_invalid(np.array(column, dtype=float), copy=False).astype(int).data

gives:

array([                   1,                    2,                    3,
       -9223372036854775808,                    4])
Andrzej Pronobis
  • 33,828
  • 17
  • 76
  • 92
0

You can't, however you can create a numpy array of object dtype cells

ma.masked_invalid(np.array(a, dtype=object), copy=False)

EDIT

Otherwise You can take a look here NumPy or Pandas: Keeping array type as integer while having a NaN value

Community
  • 1
  • 1
farhawa
  • 10,120
  • 16
  • 49
  • 91
0

It's possible to do this by first creating two empty arrays, one with data type int that will become the masked array and another with data type bool that will become the mask itself.

Then we traverse the Python array. In the arr_without_none we replace all occurrences of None with the default value and in mask_mat we store whether the original value in the Python array was None or an integer. At the end we produce a masked array out of these two components.

def masked_int_array(arr, default=0):
    arr_without_none = numpy.empty(len(arr), dtype=int)
    mask_mat = numpy.empty(len(arr), dtype=bool)
    for i in range(len(arr)):
        arr_without_none[i] = default if arr[i] is None else arr[i]
        mask_mat[i] = arr[i] is None
    return ma.array(data=arr_without_none, dtype=int, mask=mask_mat, copy=False)
Greg Nisbet
  • 6,710
  • 3
  • 25
  • 65