0

I have a dataset that has a lot of missing values. I imputed five datasets using MICE in R. I want to fit a classification machine learning model to the dataset. I want to use feature selection method to identify most important variables. Is it possible to fit a machine learning model to each dataset and identify the most important variables across all imputed datasets?

I can fit a machine learning model to each of the dataset however I don't know how to pool the results and get a final model or get the feature rankings across all imputed datasets. Is simply taking the average of feature rankings of each of the datasets valid as the final?

Is there a proper way of pooling the results?

  • Can you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of an example dataset (e.g., `mice(nhanes, maxit = 5)`) and any code you've tried so far, even if it doesn't fully work? – jrcalabrese Feb 04 '23 at 17:00

0 Answers0