Python type checking numpy arrays include their dtype

Question

I can verify my function receives inputs in the correct type using:

def foo(x: np.ndarray, y: float):
    return x * y

Making sure if I try to use this function with x that is not a np.ndarray I will get an error even before running the code.

What I don't know, is how to verify the array type. For example:

 def return_valid_points_only(points: np.ndarray, valid: np.ndarray):
    assert points.shape == valid.shape
    return points[valid]

I wish to check that valid is not only a np.ndarray but also valid.dtype == bool.

For this example, if valid will be supply with 0 and 1 to indicate validity, the program won't fail and I will get terrible results.

Thanks

These typing checks are for other programmer to easily understand the function + if it uses it wrongly (sending the function arguments from the wrong type) Pychrm letting it know on the spot — Shaq, Jan 28 '21 at 17:10
I can add a compiled line to assert that, but I wish Pychrm to "shout" at the programmer if it uses my function wrongly, even if he is on another file — Shaq, Jan 28 '21 at 17:11
I guess we're dependent on PyCharm's features to meet your requirement, not on Python. — fountainhead, Jan 28 '21 at 17:13
Might be, but the current ": np.ndarray" is compitable. I can run this code later also w/o Pycharm, so it is built in Python 3.7, and not just a PyCharm feature. — Shaq, Jan 28 '21 at 17:15
Thanks @hpaulj. Can you elaborate please? How would the function call line should look like? Should I import any lib? You can write an answer as well. — Shaq, Jan 28 '21 at 17:22
While some `numpy` functions do check things like dimensions and dtype, more often they just try to convert/coerce inputs. For example it might do `x = np.asarray(x, dtype=float)` or `x = np.atleast_2d(x)`. — hpaulj, Jan 28 '21 at 17:47
Your requirement is for PyCharm to be aware of, and sensitive to, the fact that the body of your function expects arrays whose `dtype` is `bool`. I think that's a lot to expect from an IDE. If this IDE warning is really important for you, you could try creating a subclass of `numpy` array of `bool` `dtype`, and define your function as accepting your subclass array as its second arg, rather than accepting a `numpy` array. But that could be an overkill, because subclassing numpy arrays is a bit more complicated than subclassing Python classes. — fountainhead, Jan 28 '21 at 18:04
When you pass a normal `numpy` array to your function, from Python's point of view (and hence from PyCharm's point of view) whether the array being passed has a `bool` `dtype` or some other `dtype` is captured in the **attributes** of the object, rather than in the **type** of the object. To meet your requirement, PyCharm will have have to examine the attributes of the argument object, which, I think is too much to expect from an IDE. — fountainhead, Jan 28 '21 at 18:11
Moreover, remember that the dtype attribute can be modified at application runtime. So, you could created an array consisting of `True` s and `False` s with `bool` `dtype`. At runtime, you could be modifying the `dtype` to `int`, which would essentially by-pass all the IDE warnings. Agreed this would be an extreme scenario that involves deliberate programmer malice rather than a programmer mistake. — fountainhead, Jan 28 '21 at 18:24
Thanks for the full answer @fountainhead. I won't make a subclass since my function get calls from many places in a very big code, so I believe it will make more mess than actually help. — Shaq, Jan 28 '21 at 18:29

score 0 · Answer 1 · answered Jan 28 '21 at 20:08

Python is all about asking for forgiveness, not permission. That means that even in your first definition, def foo(x: np.ndarray, y: float): is really relying on the user to honor the hint, unless you are using something like mypy.

There are a couple of approaches you can take here, usually in tandem. One is to write the function in a way that works with the inputs that are passed in, which can mean failing or coercing invalid inputs. The other method is to document your code carefully, so users can make an intelligent decisions. The second method is especially important, but I will focus on the first.

Numpy does most of the checking for you. For example, rather than expecting an array, it is idiomatic to coerce one:

x = np.asanyarray(x)

np.asanyarray is usually an alias for array(a, dtype, copy=False, order=order, subok=True). You can do something similar for y:

y = np.asanyarray(y).item()

This will allow any array-like as long as it has one element, whether scalar or not. Another way is to respect numpy's ability to broadcast arrays together, so if the user passes in y as a list of x.shape[-1] elements.

For your second function, you have a couple of options. One option is to allow a fancy indexing. So if the user passes in a list of indices vs a boolean mask, you can use both. If, on the other hand, you insist on a boolean mask, you can either check or coerce the dtype.

If you check, keep in mind that the numpy indexing operation will raise an error for you if the array sizes don't match. You only need to check the type itself:

points = np.asanyarray(points)
valid = np.asanyarray(valid)
if valid.dtype != bool:
    raise ValueError('valid argument must be a boolean mask')

If you choose to coerce instead, the user will be allowed to use zeros and ones, but valid inputs will not be copied unnecessarily:

valid = np.asanyarray(valid, bool)

Python type checking numpy arrays include their dtype

1 Answers1