5

I have ~1 million pandas dataframes containing 0-10,000 rows and 160 columns. In a dataframe 5-10 columns may have values [False, True, np.nan] and are 'object' or 'bool' dtype. Some 'object' dtype columns contain only True or False. I handle all these columns as if they could contain [False, True, np.nan], so no df.loc[df['col']] but df.loc[df['col'] == True], etc.

When I do a concat on a collection of these frames, occasionally I get In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.

Both concats below trigger the warning because df2 has a bool-only column with dtype object:

df1 = pd.DataFrame({'foo': np.zeros((2, ), dtype='bool')}, index=[0,1])
df2 = pd.DataFrame({'foo': np.ones((2, ), dtype='bool').astype('object')}, index=[2,3])
df3 = pd.DataFrame({'foo': np.array([np.nan, np.nan])}, index=[6,7])

df_ = pd.concat([df1, df2])
df_ = pd.concat([df2, df3])

I have two questions:

  1. Is df = df.infer_objects() the appropriate way to handle this, or would it be better to convert the columns to categorical? Two of my object columns are image thumbnails, but I assume the amount of data in a column has no impact on speed.

  2. Why do I get this warning when concatenating? In the pandas release notes for 1.5.0 the change is described as Deprecated treating all-bool object-dtype columns as bool-like in DataFrame.any() and DataFrame.all() with bool_only=True, explicitly cast to bool instead (GH46188). How does concat use any()/all()?

Pandas 1.5.2, python 3.8.15

Slightly related Bool and missing values in pandas

Frank_Coumans
  • 173
  • 1
  • 11

0 Answers0