Backtracking categorical features from one-hot-encoding in scikit-learn?

Question

I am trying to construct a lasso regression prediction model. I encoded all my categorical integer features using a one-hot aka one-of-K scheme using OneHotEncoder in scikit-learn. Based on the result, only 51 parameters actually influence the prediction model. I want to investigate these parameters, but they are encoded as described above. Do you know how can I extract which categorical integer feature corresponds to which one hot encoded array? Thanks!

Possible duplicate of [How to reverse sklearn.OneHotEncoder transform to recover original data?](http://stackoverflow.com/questions/22548731/how-to-reverse-sklearn-onehotencoder-transform-to-recover-original-data) — Mack, Mar 11 '17 at 20:52

BMW · Answer 1 · 2015-12-30T02:34:02.863

Using the active_features_, feature_indices_, and n_values_ attributes of sklearn.preprocessing.OneHotEncoder, a vector of the categorical features ordered by their 'position' in the one-hot array can be created as follows:

import numpy as np
from sklearn import preprocessing

enc = preprocessing.OneHotEncoder()
enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])
enc.active_features_ - np.repeat(enc.feature_indices_[:-1], enc.n_values_)
# array([0, 1, 0, 1, 2, 0, 1, 2, 3], dtype=int64)

Also, the original data can be returned from the one-hot array as follows:

x = enc.transform([[0, 1, 1], [1, 2, 3]]).toarray()
# array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.],
#       [ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  0.,  1.]])

cond = x > 0
[enc.active_features_[c.ravel()] - enc.feature_indices_[:-1] for c in cond]
# [array([0, 1, 1], dtype=int64), array([1, 2, 3], dtype=int64)]

score 0 · Answer 2 · answered Feb 10 '21 at 23:12

0

This works:

import pickle


with open('model.pickle', 'rb') as handle:
    one_hot_categories = pickle.load(handle)      


print(one_hot_categories.categories_)

answered Feb 10 '21 at 23:12

Cathy

367
1
3
16

score -1 · Answer 3 · edited Jul 25 '16 at 16:28

-1

I designed ple to enhance sklearn's Pipeline and FeatureUnion, by which we can also backtrack categorical features after one-hot-encoding or other preprocessing steps. Furthermore, we can 'draw' the transform by GraphX: for example,

for example

You can find ple on my Github page.

edited Jul 25 '16 at 16:28

Fabio says Reinstate Monica

5,271
9
40
61

answered Jul 25 '16 at 14:51

jasonfreak

1

Backtracking categorical features from one-hot-encoding in scikit-learn?

3 Answers3