In my situation, I would like to encode around 5 different columns in my dataset but the issue is that these 5 columns have many unique values.
If I encode them using label encoder I add an unnecessary order that is not right whereas if I do OHE or pd.get_dummies then I end up having a lot of features that will add to much sparseness in the data.
I am currently dealing with a supervised learning problem and the following are the unique values per column:
Job_Role : Unique categorical values = 29
Country : Unique categorical values = 12
State : Unique categorical values = 14
Segment : Unique categorical values = 12
Unit : Unique categorical values = 10
I have already looked into multiple references but not sure about the best approach. What should in this situation to have least number of features with maximum positive impact on my model