I am trying to figure out whether or not a column in a pandas dataframe is boolean or not (and if so, if it has missing values and so on).
In order to test the function that I created I tried to create a dataframe with a boolean column with missing values. However, I would say that missing values are handled exclusively 'untyped' in python and there are some weird behaviours:
> boolean = pd.Series([True, False, None])
> print(boolean)
0 True
1 False
2 None
dtype: object
so the moment you put None into the list, it is being regarded as object because python is not able to mix the types bool and type(None)=NoneType back into bool. The same thing happens with math.nan
and numpy.nan
. The weirdest things happen when you try to force pandas into an area it does not want to go to :-)
> boolean = pd.Series([True, False, np.nan]).astype(bool)
> print(boolean)
0 True
1 False
2 True
dtype: bool
So 'np.nan' is being casted to 'True'?
Questions:
Given a data table where one column is of type 'object' but in fact it is a boolean column with missing values: how do I figure that out? After filtering for the non-missing values it is still of type 'object'... do I need to implement a try-catch-cast of every column into every imaginable data type in order to see the true nature of columns?
I guess that there is a logical explanation of why np.nan is being casted to True but this is an unwanted behaviour of the software pandas/python itself, right? So should I file a bug report?