I would like to perform feature selection by a wrapper method on the iris data set using mlr package, however I would like to look only at groups of features associated with Petal and/or Sepal. So instead of looking at 4 features in different combinations the wrapper routine would look at two groups of features in different combinations.
The mlr documentation states this can be performed using two arguments bit.names
and bit.to.feature
:
bit.names [character] Names of bits encoding the solutions. Also defines the total number of bits in the encoding. Per default these are the feature names of the task.
bits.to.features [function(x, task)] Function which transforms an integer-0-1 vector into a character vector of selected features. Per default a value of 1 in the ith bit selects the ith feature to be in the candidate solution.
I could not find any examples of usage of these two arguments in mlr tutorials or elsewhere.
I will use the example provided in ?mlr::selectFeatures
.
First operating on all the features
library(mlr)
rdesc <- makeResampleDesc("Holdout")
ctrl <- makeFeatSelControlSequential(method = "sfs",
maxit = NA)
res <- selectFeatures("classif.rpart",
iris.task,
rdesc,
control = ctrl)
analyzeFeatSelResult(res)
This works as expected
In order to run over groups of features I design a 0/1 matrix to map features to groups (I am not sure if this is the way to go, it just seemed logical):
mati <- rbind(
c(0,0,1,1),
c(1,1,0,0))
rownames(mati) <- c("Petal", "Sepal")
colnames(mati) <- getTaskFeatureNames(iris.task)
the matrix looks like:
Sepal.Length Sepal.Width Petal.Length Petal.Width
Petal 0 0 1 1
Sepal 1 1 0 0
and now I run:
res <- selectFeatures("classif.rpart",
iris.task,
rdesc,
control = ctrl,
bit.names = c("Petal", "Sepal"),
bits.to.features = function(x = mati, task) mlr:::binaryToFeatures(x, getTaskFeatureNames(task)))
analyzeFeatSelResult(res)
#output
Features : 1
Performance : mmce.test.mean=0.0200000
Sepal
Path to optimum:
- Features: 0 Init : Perf = 0.66 Diff: NA *
- Features: 1 Add : Sepal Perf = 0.02 Diff: 0.64 *
Stopped, because no improving feature was found.
This appears to perform what I need but I am not quite sure I defined bits.to.features
argument correctly.
But when I try to use the same approach in a wrapper:
outer <- makeResampleDesc("CV", iters = 2L)
inner <- makeResampleDesc("Holdout")
ctrl <- makeFeatSelControlSequential(method = "sfs",
maxit = NA)
lrn <- makeFeatSelWrapper("classif.rpart",
resampling = inner,
control = ctrl,
bit.names = c("Petal", "Sepal"),
bits.to.features = function(x = mati, task) mlr:::binaryToFeatures(x, getTaskFeatureNames(task)))
r <- resample(lrn, iris.task, outer, extract = getFeatSelResult)
I receive an error:
Resampling: cross-validation
Measures: mmce
[FeatSel] Started selecting features for learner 'classif.rpart'
With control class: FeatSelControlSequential
Imputation value: 1
[FeatSel-x] 1: 00 (0 bits)
[FeatSel-y] 1: mmce.test.mean=0.7200000; time: 0.0 min
[FeatSel-x] 2: 10 (1 bits)
[FeatSel-y] 2: mmce.test.mean=0.0800000; time: 0.0 min
[FeatSel-x] 2: 01 (1 bits)
[FeatSel-y] 2: mmce.test.mean=0.0000000; time: 0.0 min
[FeatSel-x] 3: 11 (2 bits)
[FeatSel-y] 3: mmce.test.mean=0.0800000; time: 0.0 min
[FeatSel] Result: Sepal (1 bits)
Error in `[.data.frame`(df, , j, drop = drop) :
undefined columns selected
What am I doing wrong and what is the correct usage of bit.names
and bit.to.feature
arguments?
Thanks
EDIT: I posted an issue on mlr github: https://github.com/mlr-org/mlr/issues/2468