Training dataset total categorical columns: 27
Test dataset total categorical columns: 27
OH_encoder = OneHotEncoder(handle_unknown='ignore', sparse=False)
OH_cols_test = pd.DataFrame(OH_encoder.fit_transform(X_test[test_low_cardinality_cols]))
After Encoding, while preparing Test data for prediction,
number of columns from test data: 115
number of columns from train data: 122
I checked the cardinality in the test data, it is low for few columns compare to train data columns.
Train_data.column#1: 2 Test_data:column#1: 1 Train_data.column#2: 5 Test_data:column#2: 3 and more..
so automatically while one-hot encoding, the number of columns will be reduced. is there any better way to prepare the test data set without any data lose?