My data set contains continuous variables(car price, Odometer), orderly variables(car age, sale year), and categorical variables(manufacturer, country, color). My target is to build a model to predict cars' sale prices in the second-hand market.
I encoded all categorical variables into dummy variables. Now my data set includes many 0-1 variables(100+), so I plan to reduce the dimensionalities to speed up. Now, my problem is which methods I should use?
My first choice is PCA. However, it is a method for continuous variables, so I should use it when my set contains many dummy variables.
My second choice is CATPCA. To be honest, I have no idea about this method but it seems can be used for my set but I don't know how to implement it in Python.
My third idea is to split my set and use different methods. For example, I split my set into two sets: the continuous variable set and the dummy variable set. use PCA on the continuous set and CATPCA on the dummy set. Then combinate two sets. But I have no theory to support this idea.
The final idea is to try all the above methods and choose the best one on the validation set. However, according to this question, even though the PCA method gets the best result, it is less meaningful. So, it seems the validation performance cannot be a good indicator for my question.
So, what is the dimensionality reduction method I should use on my dataset?