0

What I am doing: Using Pandas to analyse a dataset taken from a survey. I have several columns that are Yes or No answers. What I want to do and what I want to ask: Change the dtype obj into boolean Yes = True, No = False. I also want to know if there is a way of doing this for several columns at once.

Thanks.

1 Answers1

0

This will work for your categorical data you can use it for multiple columns Encoding categorical data you will use LableEncoder to do encoding as 0,1,2..., according to your data but it will generate a new problem and the problem here is that since there are different numbers in the same column, the model will misunderstand the data to be in some kind of order, 0 < 1 < 2. But this isn’t the case at all. To overcome this problem, we use One Hot Encoder.

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:,1] = labelencoder_X_1.fit_transform(X[:,1])
labelencoder_X_2 = LabelEncoder()
X[:,2] = labelencoder_X_2.fit_transform(X[:,2])
onehotencoder = OneHotEncoder(categorical_features =  [1])
X = onehotencoder.fit_transform(X).toarray()
Ishaan Javali
  • 1,711
  • 3
  • 13
  • 23