Is there any way to remove a specific feature out of a scikit.learn
dataset? For example, I know it is possible to remove features using sklearn.feature_selection
, but these are all automated procedures removing features they decide are useless. Is there any way to implement a custom feature removal algorithm without going into the dirty insides of the data? For example, say I have a function that scores features, a toy example provided here:
def score(feature_index):
return 0 if feature_index == 1 else 1
Now say I want to remove all those features in the iris dataset that score less than 0.5
. I want to do something like this:
from sklearn import datasets
iris = datasets.load_iris()
#this is the function I want:
iris.filter_features(score, threshold=0.5)
after which I would like the iris dataset to have one less feature. Right now, I can do it like so:
from sklearn import datasets
iris = datasets.load_iris()
for feature_index in range(len(iris.feature_names)):
if score(feature_index) < 0.5:
iris.feature_names.pop(feature_index)
iris.data = np.delete(iris.data, feature_index, 1)
but this looks... dirty.