I have a dataset which has 5 variables and 1 response. The variables are discrete. I want to find the key variable and its value which leads to a significant increase or decrease to the response.
Are there any methods for finding the value of variable which has significant influence on response?
1 Answers
You will need to perform some statistical tests in order to find which variables are the most significant.
If you are familiar with python you could use SelectKBest from scikit-learn. It will give you a score, the highest the score, the stronger the link between the feature and the output.
Additionally you can train an explainable ML model, strong enough to converge, and find the pattern within the data, from that you could compute the feature importance.
For example you could use DecisionTreeClasifier from scikit-learn. It has a decision_path class function that will plot the decision path taken by the tree, decision_path has a property called feature_importances_ that uses Gini coefficient to compute the importance of the features.
Last but not the least, you can use feature reduction techniques, such as PCA, it's used to find the variance between variables, from the PCA you will compute new Principal Components that are linked to the features, from the most explenatory ones you can find the features importance. Check this stack overflow answer that explains everything you should know for that.

- 1,220
- 5
- 9