2

I have a column in dataframe of data type object which basically composed of a lot of missing values as NaN and some strings as 'False' and 'True' entries. I want to convert it to boolean data type but the NaN entries get converted to True. How to Do this with preserving the NaN values as it is?

1- I've tried the .astype() method which returned the NaN values as True. 2- Tried to convert first to numeric then to boolean and ended up with the same result.

# Before conversion

In[]:  ri_df.contraband_weapons.value_counts()
Out[]: False    11296
       True       499
       Name: contraband_weapons, dtype: int64

# After conversion

In[]:  ri_df.contraband_weapons.astype('bool').value_counts()
Out[]: True     498385
       False     11296
       Name: contraband_weapons, dtype: int64
smci
  • 32,567
  • 20
  • 113
  • 146
Osama Hamdy
  • 103
  • 2
  • 9
  • Possible duplicate of [Getting boolean pandas column that supports NA/ is nullable](https://stackoverflow.com/questions/34520267/getting-boolean-pandas-column-that-supports-na-is-nullable) – smci Sep 21 '19 at 22:58

2 Answers2

3

After a comment by Stef I changed completely my answer:

If you have your colum as string 'True' or 'False' intermixed with NaN values, you can use replace with a dictionary:

  • replace string 'True' with boolean True,
  • replace string 'False' with boolean False.

Something like:

ri_df.contraband_weapon.replace({'True': True, 'False': False}, inplace=True)

So the code can be quite short.

But the bad news is that the type of this column is still object. The reason is that:

  • most values are of bool type,
  • but some of them are NaN, which is actually a special case of float.

Hence, there is no any "single" type among values in this column, so the type can not be bool.

Edit following the question about "workaround"

I see that you want to preserve the "three value logic" (True / False / Unknown).

If you want to stay with native Pandas data types, I think there is no workaround, because:

  • bool is either True or False (not third option as "unknown"),
  • NaN is a special case of float,

so you have to live with this "mixture of types".

Maybe some alternative is to define a Categorical type, including three categories, corresponding to True, False and Unknown and tranlate each source value to a respective category.

Then there will be a single data type, but the dowside is that if you want to have any "3-value bool operators / functions", you have to program them on your own.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • The question reads "How to convert a column data type from 'string ' to 'boolean'?", so I think the original datatype is string and not bool. – Stef Sep 21 '19 at 18:28
  • @Valdi_Bo Thank you for your valuable contribution. The first part of your answer using a dict to map the values works fine but isn't there any workaround to get the column dtype converted to boolean with preserving the NaN values as Missing entries neither True nor False. – Osama Hamdy Sep 21 '19 at 22:55
1

You can use eval to convert string 'True'/'False' to boolean True/False and leave the NaNs untouched:

df = pd.DataFrame( {'Col1': ['True', np.nan, 'False']})
df.applymap(type)
#              Col1
#0    <class 'str'>
#1  <class 'float'>
#2    <class 'str'>
df.loc[~df.Col1.isnull(),'Col1'] = df[~df.Col1.isnull()].Col1.map(eval)
df.applymap(type)
#              Col1
#0   <class 'bool'>
#1  <class 'float'>
#2   <class 'bool'>
Stef
  • 28,728
  • 2
  • 24
  • 52
  • can you try with `df[~df.Col1.isnull()].Col1.map(ast.literal_eval)` , which should be a better option than `eval` – anky Sep 21 '19 at 17:07
  • @anky_91 `ast.literal_eval` works too - why is it a better option than `eval`? – Stef Sep 21 '19 at 17:11
  • umm i just dont prefer it :) [here](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice) is the reason , but `ast` module is specifically made for this so that should be safer check [this](https://stackoverflow.com/questions/15197673/using-pythons-eval-vs-ast-literal-eval) – anky Sep 21 '19 at 17:13
  • @anky_91 `literal_eval` isn't mentioned in the link, it just explains that in many cases you could do without eval (as in our case, where we could use e.g. np.where), but I just read the docs of literal_eval stating "This can be used for safely evaluating strings containing Python values from untrusted sources without the need to parse the values oneself.", so it might be better indeed. Thanks for the comment, I didn't know the `ast` module before. – Stef Sep 21 '19 at 17:21
  • did you check the second link too? bdw np :) – anky Sep 21 '19 at 17:21
  • 1
    @anky_91 oops, my bad, I overlooked it. Thanks. – Stef Sep 21 '19 at 17:24