Appending two pandas dataframes has an unexpected behavior when one of the dataframes has a column with all null values (NaN) and the other one has boolean values at the same column. The corresponding column in the resulting (from appending) dataframe is typed as float64 and the boolean values are turned into ones and zeros based on their original boolean values. Example:
df1 = pd.DataFrame(data = [[1, 2 ,True], [10, 20, True]], columns=['a', 'b', 'c'])
df1
a b c
0 1 2 True
1 10 20 False
df2 = pd.DataFrame(data = [[1,2], [10,20]], columns=['a', 'b'])
df2['c'] = np.nan
df2
a b c
0 1 2 NaN
1 10 20 NaN
Appending:
df1.append(df2)
a b c
0 1 2 1.0
1 10 20 0.0
0 1 2 NaN
1 10 20 NaN
My workaround is to reset the typing of the column as bool, but this turns the NaN values to booleans:
appended_df = df1.append(df2)
appended_df
a b c
0 1 2 1.0
1 10 20 0.0
0 1 2 NaN
1 10 20 NaN
appended_df['c'] = appended_df.c.astype(bool)
appended_df
a b c
0 1 2 True
1 10 20 False
0 1 2 True
1 10 20 True
Unfortunately, the pandas append documentation doesn't refer to the problem, any idea why pandas has this behavior?