Questions tagged [one-hot-encoding]

One-Hot Encoding is a method to encode categorical variables to numerical data that Machine Learning algorithms can deal with. One-Hot encoding is most used during feature engineering for a ML Model. It converts categorical values into a new categorical column and assign a binary value of 1 or 0 to those columns.

Also known as Dummy Encoding, One-Hot Encoding is a method to encode categorical variables, where no such ordinal relationship exists, to numerical data that Machine Learning algorithms can deal with. One hot encoding is the most widespread approach, and it works very well unless your categorical variable takes on a large number of unique values. One hot encoding creates new, binary columns, indicating the presence of each possible value from the original data. These columns store ones and zeros for each row, indicating the categorical value of that row.

1224 questions

352

votes

22 answers

Convert array of indices to one-hot encoded array in NumPy

Given a 1D array of indices: a = array([1, 0, 3]) I want to one-hot encode this as a 2D array: b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

asked Apr 23 '15 at 18:24

James Atwood

4,289
2
17
17

248

votes

22 answers

How can I one hot encode in Python?

I have a machine learning classification problem with 80% categorical variables. Must I use one hot encoding if I want to use some classifier for the classification? Can i pass the data to a classifier without the encoding? I am trying to do the…

python pandas machine-learning one-hot-encoding

asked May 18 '16 at 07:26

avicohen

2,897
6
16
16

votes

6 answers

Can sklearn random forest directly handle categorical features?

Say I have a categorical feature, color, which takes the values ['red', 'blue', 'green', 'orange'], and I want to use it to predict something in a random forest. If I one-hot encode it (i.e. I change it to four dummy variables), how do I tell…

python scikit-learn random-forest one-hot-encoding

asked Jul 12 '14 at 16:54

tkunk

1,378
1
13
19

votes

5 answers

Running get_dummies on several DataFrame columns?

How can one idiomatically run a function like get_dummies, which expects a single column and returns several, on multiple DataFrame columns?

python pandas dataframe one-hot-encoding

asked Jun 08 '14 at 19:04

Emre

5,976
7
29
42

votes

9 answers

One Hot Encoding using numpy

If the input is zero I want to make an array which looks like this: [1,0,0,0,0,0,0,0,0,0] and if the input is 5: [0,0,0,0,0,1,0,0,0,0] For the above I wrote: np.put(np.zeros(10),5,1) but it did not work. Is there any way in which, this can be…

python numpy one-hot-encoding

asked Jul 26 '16 at 14:15

Abhijay Ghildyal

4,044
6
33
54

votes

3 answers

Feature names from OneHotEncoder

I am using OneHotEncoder to encode few categorical variables (eg - Sex and AgeGroup). The resulting feature names from the encoder are like - 'x0_female', 'x0_male', 'x1_0.0', 'x1_15.0' etc. >>> train_X = pd.DataFrame({'Sex':['male', 'female']*3,…

python-3.x scikit-learn one-hot-encoding

asked Feb 07 '19 at 10:13

Supratim Haldar

2,376
3
16
26

votes

2 answers

Adding dummy columns to the original dataframe

I have a dataframe looks like this: EXEC_FULLNAME YEAR BECAMECEO CO_PER_ROL 5622 Ira A. Eichner 1992 19550101 5622 Ira A. Eichner 1993 19550101 5622 Ira A. Eichner 1994 19550101 5623 David P. Storch…

python python-3.x pandas dataframe one-hot-encoding

asked Apr 22 '14 at 01:19

Brad

votes

5 answers

How to one hot encode several categorical variables in R

I'm working on a prediction problem and I'm building a decision tree in R, I have several categorical variables and I'd like to one-hot encode them consistently in my training and testing set. I managed to do it on my training data with : temps <-…

r one-hot-encoding

asked Feb 06 '18 at 18:16

xeco

votes

3 answers

One hot encoding of string categorical features

I'm trying to perform a one hot encoding of a trivial dataset. data = [['a', 'dog', 'red'] ['b', 'cat', 'green']] What's the best way to preprocess this data using Scikit-Learn? On first instinct, you'd look towards Scikit-Learn's…

python encoding scikit-learn one-hot-encoding

asked Jan 30 '16 at 21:50

hlin117

20,764
31
72
93

votes

6 answers

Using Scikit-Learn OneHotEncoder with a Pandas DataFrame

I'm trying to replace a column within a Pandas DataFrame containing strings into a one-hot encoded equivalent using Scikit-Learn's OneHotEncoder. My code below doesn't work: from sklearn.preprocessing import OneHotEncoder # data is a Pandas…

python pandas machine-learning scikit-learn one-hot-encoding

asked Sep 25 '19 at 14:47

dd.

votes

5 answers

How to give column names after one-hot encoding with sklearn?

Here is my question, I hope someone can help me to figure it out.. To explain, there are more than 10 categorical columns in my data set and each of them has 200-300 categories. I want to convert them into binary values. For that I used first label…

python encoding scikit-learn one-hot-encoding

asked May 28 '19 at 09:19

Aditya Pratama

votes

3 answers

In TensorFlow, what is the argument 'axis' in the function 'tf.one_hot'

Could anyone help with an an explanation of what axis is in TensorFlow's one_hot function? According to the documentation: axis: The axis to fill (default: -1, a new inner-most axis) Closest I came to an answer on SO was an explanation relevant to…

python-3.x tensorflow machine-learning multidimensional-array one-hot-encoding

asked Jan 03 '18 at 18:11

tinonetic

7,751
11
54
79

votes

12 answers

OneHotEncoder categorical_features deprecated, how to transform specific column

I need to transform the independent field from string to arithmetical notation. I am using OneHotEncoder for the transformation. My dataset has many independent columns of which some are as: Country | Age …

python machine-learning categorical-data one-hot-encoding

asked Jan 24 '19 at 11:32

Hassaan

3,931
11
34
67

votes

4 answers

Convert a 2d matrix to a 3d one hot matrix numpy

I have np matrix and I want to convert it to a 3d array with one hot encoding of the elements as third dimension. Is there a way to do with without looping over each row eg a=[[1,3], [2,4]] should be made into b=[[1,0,0,0], [0,0,1,0], …

python numpy vectorization one-hot-encoding

asked Apr 30 '16 at 21:15

Rahul

3,220
4
22
28

votes

3 answers

converting tensor to one hot encoded tensor of indices

I have my label tensor of shape (1,1,128,128,128) in which the values might range from 0,24. I want to convert this to one hot encoded tensor, using the nn.fucntional.one_hot function n = 24 one_hot = torch.nn.functional.one_hot(indices, n) but…

pytorch one-hot-encoding

asked Jun 09 '19 at 09:46

Ryan

8,459
14
40
66

2 3

…

81 82 Next