Numpy int arrays can't store missing values.
>>> import numpy as np
>>> np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> myArray = np.arange(10)
>>> myArray.dtype
dtype('int32')
>>> myArray[0] = None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
>>> myArray.astype( dtype = 'float')
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
>>> myFloatArray = myArray.astype( dtype = 'float')
>>> myFloatArray[0] = None
>>> myFloatArray
array([ nan, 1., 2., 3., 4., 5., 6., 7., 8., 9.])
Pandas warns about this in the docs - Caveats and Gotchas, Support for int NA. Wes McKinney also reiterates the point in this stack question
I need to be able to store missing values in an int array. I'm INSERTing rows into my database which I've set up to accept only ints of varying sizes.
My current work around is to store the array as an object, which can hold both ints and None-types as elements.
>>> myArray.astype( dtype = 'object')
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=object)
>>> myObjectArray = myArray.astype( dtype = 'object')
>>> myObjectArray[0] = None
>>> myObjectArray
array([None, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=object)
This seems to be memory intensive and slow for large data-sets. I was wondering if anyone has a better solution while the numpy development is underway.