Value to Assign to Missing Values in uint Numpy Array

Question

A numpy array z is constructed from 2 Python lists x and y where values of y can be 0 and values of x are not continuously incrementing (i.e. values can be skipped).

Since y values can also be 0, it will be confusing to assign missing values in z to be 0 as well.

What is the best practice to avoid this confusion?

import numpy as np

# Construct `z`
x = [1, 2, 3, 5, 8, 13]
y = [12, 34, 56, 0, 78, 0]
z = np.ndarray(max(x)+1).astype(np.uint32)  # missing values become 0
for i in range(len(x)):
    z[x[i]] = y[i]

print(z)        # [ 0 12 34 56  0  0  0  0 78  0  0  0  0  0]
print(z[4])     # missing value but is assigned 0
print(z[13])    # non-missing value but also assigned 0

Can you accept signed integers? What do you want to do with the missing values later? — David Hoffman, Aug 21 '20 at 02:42
@DavidHoffman Best to stick to unsigned integers, but it is probably beneficial to also know the solution when signed integers can be used. When a missing value is detected when reading from the array, a different logic may be used in the main program, such as raising an error or accessing the value at another index until a non-missing element is found — Athena Wisdom, Aug 21 '20 at 12:56

CypherX · Accepted Answer · 2020-08-22T00:32:56.127

2

Solution

You could typically assign np.nan or any other value for the non-existing indices in x.

Also, no need for the for loop. You can directly assign all values of y in one line, as I showed here.

However, since you are typecasting to uint32, you cannot use np.nan (why not?). Instead, you could use a large number (for example, 999999) of your choice, which by design, will not show up in y. For more details, please refer to the links shared in the References section below.

import numpy as np

x = [1, 2, 3, 5, 8, 13]
y = [12, 34, 56, 0, 78, 0]
# cannot use np.nan with uint32 as np.nan is treated as a float
# choose some large value instead: 999999 
z = np.ones(max(x)+1).astype(np.uint32) * 999999 
z[x] = y
z

# array([999999,     12,     34,     56, 999999,      0, 999999, 999999,
#            78, 999999, 999999, 999999, 999999,      0], dtype=uint32)

References

edited Aug 22 '20 at 00:32

answered Aug 21 '20 at 01:59

CypherX

7,019
3
25
37

@athena-wisdom Does this help? – CypherX Aug 21 '20 at 02:04
Why `y.copy()`? – mathfux Aug 21 '20 at 02:09
How do you preserve the original `np.uint32` `dtype` after multiplying with `np.nan`? Seems like the numbers are now `np.float64` – Athena Wisdom Aug 21 '20 at 02:16
@mathfux Good catch. That was a typo. Removed the `.copy()`. Thank you. – CypherX Aug 21 '20 at 02:36
@AthenaWisdom Try now. You need a large value (such as 999999) that you don't expect to find in `y`, and assign it instead of `nan`, since `np.nan` is treated as a `float`. – CypherX Aug 21 '20 at 02:38

Value to Assign to Missing Values in uint Numpy Array

1 Answers1

Solution

References