1

While writing some program involving numpy, I found that membership test doesn't work as expected for numpy dtype objects. Specifically, the result is unexpected for set, but not list or tuple.

import numpy as np
x = np.arange(5).dtype
y = np.int64
print(x in {y}, x in (y,), x in [y])

the result is False True True.

found this in both Python 2.7 and 3.6, with numpy 1.12.x installed.

Any idea why?

UPDATE

looks that dtype objects don't respect some assumptions about hashing in Python.

http://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/

and https://github.com/numpy/numpy/issues/5345

Thanks @ser2357112 and @Fabien

zym1010
  • 43
  • 5
  • Related / Dupe: [Why do these dtypes compare equal but hash different?](https://stackoverflow.com/q/35293672/674039) / [Making an object x such that “x in \[x\]” returns False](https://stackoverflow.com/q/29692140/674039) – wim Sep 26 '20 at 15:35

2 Answers2

2

The __hash__ and __eq__ implementations of dtype objects were pretty poorly thought out. Among other problems, the __hash__ and __eq__ implementations aren't consistent with each other. You're seeing the effects of that here.

Some other problems with dtype __hash__ and __eq__ are that

  • dtype objects are actually mutable in ways that affect both __hash__ and __eq__, something that should never be true of a hashable object. (Specifically, you can reassign the names of a structured dtype.)
  • dtype equality isn't transitive. For example, with the x and y in your question, we have x == y and x == 'int64', but y != 'int64'.
  • dtype __eq__ raises TypeError when it should return NotImplemented.

You could submit a bug report, but looking at existing bug reports relating to those methods, it's unlikely to be fixed. The design is too much of a mess, and people are already relying on the broken parts.

user2357112
  • 260,549
  • 28
  • 431
  • 505
0

The difference lies in how sets implement the in keyword in Python.

Lists simply examine each object, checking for equality. Sets first hash the objects.

different meaning of the 'in' keyword for sets and lists

This is because sets must ensure uniqueness. But your objects are not equivalent:

>>> x
dtype('int64')
>>> y
<class 'numpy.int64'>

Hashing them probably delivers different results.

Fabien
  • 4,862
  • 2
  • 19
  • 33
  • thanks. I looked at some other sources on this, like http://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/ seems that numpy implemented hash functions for dtype objects poorly? – zym1010 Jul 16 '17 at 03:42
  • According to [hpaulj](https://stackoverflow.com/questions/35293672/why-do-these-dtypes-compare-equal-but-hash-different), "looks like np.dtype does not define any special hash method, it just inherits from object.", so yeah... `in list({})` will do you the job, `__equal__()` seems to work ok – Fabien Jul 16 '17 at 03:45