I am trying to encode and scale my datafame using sklearns pipelines. It's just returning a numpy array instead of a dataframe. Instead of making a hacky solution(which I am best at!), I was hoping there was a easier/standard way to get an encoded/scaled dataframe back.
Here's a sample of the code I'm trying to encode/scale :
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
num_attributes = list(train_set.select_dtypes(exclude=['object'])) #to select all num columns, we exclude any column with object types
cat_attributes = list(train_set.select_dtypes(include=['object'])) #here we select all columns with object types
cat_pipeline = Pipeline([
('imputer', SimpleImputer(fill_value='none', strategy='constant')),
('one_hot', OneHotEncoder())
])
full_pipeline = ColumnTransformer([
('num', StandardScaler(), num_attributes),
('cat', cat_pipeline, cat_attributes)
])
train_set_prepared = full_pipeline.fit_transform(train_set)
Result is numpy array:
(0, 0) nan
(0, 1) -0.002676506826924531
(0, 2) nan
(0, 3) -0.03350622836892517
(0, 4) nan
(0, 5) -0.03294496247236749
(0, 6) 0.002534826949104915
Is there a way to transform it easily back into a datafame that is scaled/encoded?