3

I know most people perform feature selection running RFE on a linear regression model, for example, BEFORE training the model with Keras.

However is it possible to do it within the training procedure of the deep neural network? If so how? Are there any downsides to it?

Caterina
  • 775
  • 9
  • 26

2 Answers2

1

I think a deep neural network does feature selection by default during training and searches for the most important information, but it can lead to overfitting and/or make convergence slower.

Kiwi
  • 81
  • 1
  • 9
  • 1
    Problem is that training with my number of features is unfeasible so I need to do some pre-selection of features, however doing it before training may lead to data leakage. I wanted some sort of within-NN-prefiltering. But not sure if this is possible or even available. I know that weights for those features that are not contributing would be close to zero, but then again I cannot train with this amount of features due to "curse of dimensionailty" (too many features for few samples) – Caterina Oct 20 '22 at 14:23
  • 1
    I think the feature selection has to be done prior to training. Maybe look into Variable Selection Networks (VSN), proposed by Bryan Lim et al. – Kiwi Oct 20 '22 at 14:28
  • Thanks will do and give you the bounty if it works. I already gave you an upvote :) – Caterina Oct 20 '22 at 14:32
  • does this work only for classification on can be used for regression too? Do you know why they increase dimensionality of inputs? I created this new question: https://stackoverflow.com/questions/74553776/gated-residual-and-variable-selection-networks-for-regression – Caterina Nov 24 '22 at 15:13
-1

One way to do this would be using cross-validation, you could train more than one model per k-fold. So, you could make the feature selection in each k-fold, you would need to adapt the new model to the selected features.

I don't know if there's a way to do it during training a NN, but I suppose it would be by training epoch.

The drawback is the complexity of the implementation and the increase in computation time with validations/training.

gaspar
  • 59
  • 1
  • 3