Untransform after OneHotEncoder

Question

I'm using sklearn's OneHotEncoder, but want to untransform my data. any idea how to do that?

>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])  
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9])
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

but I want to be able to do the following:

>>> enc.untransform(array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]]))
[[0, 1, 1]]

How would I go about doing this?

For context, I've built a neural network that learns the one-hot encoding space, and want to now use the nn to make real predictions that need to be in the original data format.

I notice that sklearn.feature_extraction.DictVectorizer has an inverse_transform method. — kmace, Jun 08 '16 at 04:59
just found this answer, it's very elaborated but it may help you http://stackoverflow.com/questions/22548731/how-to-reverse-sklearn-onehotencoder-transform-to-recover-original-data — Guiem Bosch, Jun 08 '16 at 05:58

bmjrowe · Answer 1 · 2018-01-22T15:36:27.403

For Inverting a single one hot encoded item
see: https://stackoverflow.com/a/39686443/7671913

from sklearn.preprocessing import OneHotEncoder
import numpy as np

orig = np.array([6, 9, 8, 2, 5, 4, 5, 3, 3, 6])

ohe = OneHotEncoder()
encoded = ohe.fit_transform(orig.reshape(-1, 1)) # input needs to be column-wise

decoded = encoded.dot(ohe.active_features_).astype(int)
assert np.allclose(orig, decoded)

For Inverting an array of one hot coded items see (as stated in the comments)
see: How to reverse sklearn.OneHotEncoder transform to recover original data?

Given the sklearn.OneHotEncoder instance called ohc, the encoded data (scipy.sparse.csr_matrix) output from ohc.fit_transform or ohc.transform called out, and the shape of the original data (n_samples, n_feature), recover the original data X with:

recovered_X = np.array([ohc.active_features_[col] for col in out.sorted_indices().indices])
            .reshape(n_samples, n_features) - ohc.feature_indices_[:-1]

Since version 0.20 of scikit-learn, the `active_features_` attribute of the OneHotEncoder class has been deprecated. — gented, Jan 20 '20 at 11:25

Untransform after OneHotEncoder

1 Answers1