0

I am trying to construct a lasso regression prediction model. I encoded all my categorical integer features using a one-hot aka one-of-K scheme using OneHotEncoder in scikit-learn. Based on the result, only 51 parameters actually influence the prediction model. I want to investigate these parameters, but they are encoded as described above. Do you know how can I extract which categorical integer feature corresponds to which one hot encoded array? Thanks!

  • 1
    Check a `feature_indices_` attribute. – hellpanderr Nov 28 '15 at 10:55
  • 1
    Possible duplicate of [How to reverse sklearn.OneHotEncoder transform to recover original data?](http://stackoverflow.com/questions/22548731/how-to-reverse-sklearn-onehotencoder-transform-to-recover-original-data) – Mack Mar 11 '17 at 20:52

3 Answers3

0

Using the active_features_, feature_indices_, and n_values_ attributes of sklearn.preprocessing.OneHotEncoder, a vector of the categorical features ordered by their 'position' in the one-hot array can be created as follows:

import numpy as np
from sklearn import preprocessing

enc = preprocessing.OneHotEncoder()
enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])
enc.active_features_ - np.repeat(enc.feature_indices_[:-1], enc.n_values_)
# array([0, 1, 0, 1, 2, 0, 1, 2, 3], dtype=int64)

Also, the original data can be returned from the one-hot array as follows:

x = enc.transform([[0, 1, 1], [1, 2, 3]]).toarray()
# array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.],
#       [ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  0.,  1.]])

cond = x > 0
[enc.active_features_[c.ravel()] - enc.feature_indices_[:-1] for c in cond]
# [array([0, 1, 1], dtype=int64), array([1, 2, 3], dtype=int64)]
BMW
  • 509
  • 3
  • 15
0

This works:

import pickle


with open('model.pickle', 'rb') as handle:
    one_hot_categories = pickle.load(handle)      


print(one_hot_categories.categories_)
Cathy
  • 367
  • 1
  • 3
  • 16
-1

I designed ple to enhance sklearn's Pipeline and FeatureUnion, by which we can also backtrack categorical features after one-hot-encoding or other preprocessing steps. Furthermore, we can 'draw' the transform by GraphX: for example,

for example

You can find ple on my Github page.