I had to encode 7 features in One Hot, thus it created sparse matrix as a result. my questions are:
- Since I cannot see the actual data behind sparse matrix, I had to scale them first because indexes got messed up. is there any way around it by which it wont create sparse matrix which allow me to play with indexes.
- Will that ML model learn from sparse matrix just fine ?
- How do I not fall into sparse matrix while OneHotEncoding Multiple features? (I checked if we encode only 2 columns then it won't create sparse matrix, but for 7 it does.)
Below is my code
#Standard Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
Xtrain[:, (3,5)] = sc.fit_transform(Xtrain[:, (3,5)])
Xtest[:, (3,5)] = sc.transform(Xtest[:, (3,5)])
#One Hot Encoding
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [1,2,4,6,7])], remainder = 'passthrough')
Xtrain = ct.fit_transform(Xtrain)
Xtest = ct.fit_transform(Xtest)