Questions tagged [one-hot-encoding]

One-Hot Encoding is a method to encode categorical variables to numerical data that Machine Learning algorithms can deal with. One-Hot encoding is most used during feature engineering for a ML Model. It converts categorical values into a new categorical column and assign a binary value of 1 or 0 to those columns.

Also known as Dummy Encoding, One-Hot Encoding is a method to encode categorical variables, where no such ordinal relationship exists, to numerical data that Machine Learning algorithms can deal with. One hot encoding is the most widespread approach, and it works very well unless your categorical variable takes on a large number of unique values. One hot encoding creates new, binary columns, indicating the presence of each possible value from the original data. These columns store ones and zeros for each row, indicating the categorical value of that row.

1224 questions
352
votes
22 answers

Convert array of indices to one-hot encoded array in NumPy

Given a 1D array of indices: a = array([1, 0, 3]) I want to one-hot encode this as a 2D array: b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])
James Atwood
  • 4,289
  • 2
  • 17
  • 17
248
votes
22 answers

How can I one hot encode in Python?

I have a machine learning classification problem with 80% categorical variables. Must I use one hot encoding if I want to use some classifier for the classification? Can i pass the data to a classifier without the encoding? I am trying to do the…
avicohen
  • 2,897
  • 6
  • 16
  • 16
80
votes
6 answers

Can sklearn random forest directly handle categorical features?

Say I have a categorical feature, color, which takes the values ['red', 'blue', 'green', 'orange'], and I want to use it to predict something in a random forest. If I one-hot encode it (i.e. I change it to four dummy variables), how do I tell…
tkunk
  • 1,378
  • 1
  • 13
  • 19
72
votes
5 answers

Running get_dummies on several DataFrame columns?

How can one idiomatically run a function like get_dummies, which expects a single column and returns several, on multiple DataFrame columns?
Emre
  • 5,976
  • 7
  • 29
  • 42
59
votes
9 answers

One Hot Encoding using numpy

If the input is zero I want to make an array which looks like this: [1,0,0,0,0,0,0,0,0,0] and if the input is 5: [0,0,0,0,0,1,0,0,0,0] For the above I wrote: np.put(np.zeros(10),5,1) but it did not work. Is there any way in which, this can be…
Abhijay Ghildyal
  • 4,044
  • 6
  • 33
  • 54
58
votes
3 answers

Feature names from OneHotEncoder

I am using OneHotEncoder to encode few categorical variables (eg - Sex and AgeGroup). The resulting feature names from the encoder are like - 'x0_female', 'x0_male', 'x1_0.0', 'x1_15.0' etc. >>> train_X = pd.DataFrame({'Sex':['male', 'female']*3,…
Supratim Haldar
  • 2,376
  • 3
  • 16
  • 26
51
votes
2 answers

Adding dummy columns to the original dataframe

I have a dataframe looks like this: EXEC_FULLNAME YEAR BECAMECEO CO_PER_ROL 5622 Ira A. Eichner 1992 19550101 5622 Ira A. Eichner 1993 19550101 5622 Ira A. Eichner 1994 19550101 5623 David P. Storch…
Brad
  • 569
  • 1
  • 4
  • 8
41
votes
5 answers

How to one hot encode several categorical variables in R

I'm working on a prediction problem and I'm building a decision tree in R, I have several categorical variables and I'd like to one-hot encode them consistently in my training and testing set. I managed to do it on my training data with : temps <-…
xeco
  • 511
  • 1
  • 4
  • 3
39
votes
3 answers

One hot encoding of string categorical features

I'm trying to perform a one hot encoding of a trivial dataset. data = [['a', 'dog', 'red'] ['b', 'cat', 'green']] What's the best way to preprocess this data using Scikit-Learn? On first instinct, you'd look towards Scikit-Learn's…
hlin117
  • 20,764
  • 31
  • 72
  • 93
32
votes
6 answers

Using Scikit-Learn OneHotEncoder with a Pandas DataFrame

I'm trying to replace a column within a Pandas DataFrame containing strings into a one-hot encoded equivalent using Scikit-Learn's OneHotEncoder. My code below doesn't work: from sklearn.preprocessing import OneHotEncoder # data is a Pandas…
dd.
  • 671
  • 1
  • 5
  • 13
32
votes
5 answers

How to give column names after one-hot encoding with sklearn?

Here is my question, I hope someone can help me to figure it out.. To explain, there are more than 10 categorical columns in my data set and each of them has 200-300 categories. I want to convert them into binary values. For that I used first label…
Aditya Pratama
  • 657
  • 1
  • 8
  • 21
27
votes
3 answers

In TensorFlow, what is the argument 'axis' in the function 'tf.one_hot'

Could anyone help with an an explanation of what axis is in TensorFlow's one_hot function? According to the documentation: axis: The axis to fill (default: -1, a new inner-most axis) Closest I came to an answer on SO was an explanation relevant to…
25
votes
12 answers

OneHotEncoder categorical_features deprecated, how to transform specific column

I need to transform the independent field from string to arithmetical notation. I am using OneHotEncoder for the transformation. My dataset has many independent columns of which some are as: Country | Age …
Hassaan
  • 3,931
  • 11
  • 34
  • 67
24
votes
4 answers

Convert a 2d matrix to a 3d one hot matrix numpy

I have np matrix and I want to convert it to a 3d array with one hot encoding of the elements as third dimension. Is there a way to do with without looping over each row eg a=[[1,3], [2,4]] should be made into b=[[1,0,0,0], [0,0,1,0], …
Rahul
  • 3,220
  • 4
  • 22
  • 28
22
votes
3 answers

converting tensor to one hot encoded tensor of indices

I have my label tensor of shape (1,1,128,128,128) in which the values might range from 0,24. I want to convert this to one hot encoded tensor, using the nn.fucntional.one_hot function n = 24 one_hot = torch.nn.functional.one_hot(indices, n) but…
Ryan
  • 8,459
  • 14
  • 40
  • 66
1
2 3
81 82