-1

I have a dataset in which the target variable is False and True but the column is not Boolean in nature rather an object. I have tried these but they don't work:-

df_new['Y_var'] = df_new['Y_var'].map({'True': 1, 'False': 0})

and

df_new['Y_var']=df_new[Y_var'].replace(['False','True'],[0,1])

and

df_new['Y_var'].astype(bool).astype(int)

output of df_new['Y_var'].to_list():

[' False.', ' False.', ' False.', ' False.', ' True.', ' False.', ' False.', ' False.', ' False.', ' True.',.....]
mozway
  • 194,879
  • 13
  • 39
  • 75
dsnoob27
  • 31
  • 8
  • You would have needed `map({True: 1, False: 0})` without quotes. If you can't solve it, please provide the output of `df_new['Y_var'].to_list()` (before any change) – mozway Aug 25 '22 at 11:14
  • @mozway no it isn't working it is giving me an NaN column , as i said the column is not boolean but an object type – dsnoob27 Aug 25 '22 at 11:19
  • please provide the requested output, object type can contain booleans… – mozway Aug 25 '22 at 11:37

2 Answers2

0

Are you sure they don't work? The first one works for me.

df_new = pd.DataFrame([
    {'Y_var': "True"},
    {'Y_var': "False"},
    {'Y_var': "True"},
    {'Y_var': "False"},
    {'Y_var': "False"},
    {'Y_var': "True"},
])

df_new['Y_var'] = df_new['Y_var'].map({'True': 1, 'False': 0})
print(df_new)

Gives me:

   Y_var
0      1
1      0
2      1
3      0
4      0
5      1
-1

this should work:

bool_list = [bool(x) for x in df_new["Y_var"]]
temp_df = pd.DataFrame({"Y_var": bool_list})
df_new.update(temp_df)
df_new["Y_var"] = df_new["Y_var"] * 1

here is picture of it on my machine

  • No it is not working, Error:-IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer – dsnoob27 Aug 25 '22 at 11:15
  • @divinity27 try the second example you gave in your question but get rid off the '' around true and false – Benjamin Zatman Aug 25 '22 at 11:17
  • not working still , NaN error – dsnoob27 Aug 25 '22 at 11:20
  • @divinity27 i updated my answer, please try now – Benjamin Zatman Aug 25 '22 at 11:23
  • with your updated answer my column remains unchanged, the column is not boolean , but rather object in nature, did you read that? – dsnoob27 Aug 25 '22 at 11:25
  • @divinity27 i updated it again plz try it and lmk how it goes, it works in my test case – Benjamin Zatman Aug 25 '22 at 11:33
  • There is no reason whatsoever to use `eval` here… – mozway Aug 25 '22 at 11:38
  • @mozway yes there is, he has true and false as strings, you use eval to convert them to booleans, you may be able to use bool() instead of eval but still one of them is needed – Benjamin Zatman Aug 25 '22 at 11:40
  • No, I can assure you, this isn't a valid reason. In any case, OP should provide a clear reproducible example as DataFrame constructor, or dictionary/list. – mozway Aug 25 '22 at 11:41
  • @mozway yes he should provide that information but from the information i have what i did should be right and you cann see from the picture – Benjamin Zatman Aug 25 '22 at 11:47
  • It "works", it doesn't mean that it's right. You should probably never have to use `eval` in python code in your life. Anyway, let's wait for OP's clarification on the real data. – mozway Aug 25 '22 at 11:48
  • @mozway [' False.', ' False.', ' False.', ' False.', ' True.', ' False.', ' False.', ' False.', ' False.', ' True.',.....] This is my column to a list output – dsnoob27 Aug 25 '22 at 11:50
  • OK, so you have neither `True` nor `"True"`. You have `" True."`, the space and dot matter here. `df_new['Y_var'].map({' True.': 1, ' False.': 0})` should work. Or maybe `df_new['Y_var'].str.contains('true', case=False)` if you only expect True/False variations. – mozway Aug 25 '22 at 11:51
  • @Benjamin feel free to update your answer and to remove this ugly `eval`. (I'll then remove my downvote). – mozway Aug 25 '22 at 11:54
  • But still i am getting no change @mozway my code:- df_new['Y_var'].map({'True.': 1, 'False.': 0}) print(df_new['Y_var']) my output:- 0 False. 1 False. 2 False. 4 False. 5 True. ... 3328 False. 3329 False. 3330 False. 3331 True. 3332 False. – dsnoob27 Aug 25 '22 at 11:55
  • Sorry I forgot the change of type, `df_new['Y_var'].str.contains('true', case=False).astype(int)` – mozway Aug 25 '22 at 11:56
  • @mozway better? – Benjamin Zatman Aug 25 '22 at 12:03
  • Yes it's working now , thanks @mozway, I guess it really helps to check out small formatting issues in data – dsnoob27 Aug 25 '22 at 12:04
  • @divinity27 next time, **please**, provide a reproducible input from the beginning, this saves everyone's time ;) – mozway Aug 25 '22 at 12:06
  • @BenjaminZatman no, for different reasons. Feel free to use my answer in yours. – mozway Aug 25 '22 at 12:07
  • @mozway, I will keep that in mind – dsnoob27 Aug 25 '22 at 12:11