(2/19/2019): I opened up a report in the numexpr tracker: https://github.com/pydata/numexpr/issues/331
The pandas report is: https://github.com/pandas-dev/pandas/issues/25369
Unless I'm doing something I'm not supposed to, the new dtype extensions for nullable int appear to have a bug with the QUERY method on dataframe (the problem seems to be in the numexpr package):
df_test = pd.DataFrame(data=[4,5,6], columns=["col_test"])
df_test = df_test.astype(dtype={"col_test": pd.Int32Dtype()})
df_test.query("col_test != 6")
Last lines of the long error message are:
File "...\site_packages\numexpr\necompiler.py", line 822, in evaluate zip(names, arguments)] File "...\site_packages\numexpr\necompiler.py", line 821, in signature = [(name, getType(arg)) for (name, arg) in File "...\site_packages\numexpr\necompiler.py", line 703, in getType raise ValueError("unknown type %s" % a.dtype.name) ValueError: unknown type object
The non-extension dtypes work fine:
df_test = df_test.astype(dtype={"col_test": np.int32})
df_test.query("col_test != 6")
(p.s. as an entirely separate issue, passing the dtype to the pd.DataFrame constructor directly doesn't work--seems buggy).
Thanks.