2

Consider a dummy dataframe:

A B C  D …. Z
1 2 as we   2
2 4 qq rr   5 
4 5 tz rc   9

This dataframe has 25 independent variables and one target variable ,the independent variables are mixture of high cardinal features, numerical features and low cardinal features, and the target variable is numerical. Now I first want to select or filter variables which are helpful in predicting the target variable. Any suggestions or tips towards achieving this goal is appreciated. Hope my question is clear, if the form of question is unclear I welcome the suggestions to make correction.

What I tried so far? I applied target mean encoding (smooth mean) on the categorical features w.r.t target variable. Then I applied random forest to know variable importance. And the weird thing is that the random forest is selecting only one feature all the time, I expected at least 3-4 meaningful variables. I tried neural networks but the result is no different , what would be reason for this? What does that mean if the algorithms only using one variable? And the test predictions are not very accurate. The RMSE is about 2.4 where the target feature usually range from 20-40 in value. Thank you for your patience on reading this. P.S: I am using SKlearn and in python.

Chinti
  • 193
  • 10

0 Answers0