I am using Isolation Forest in R to perform Anomaly Detection on multivariate data.
I tried calculating the anomaly scores along with contribution of individual metric in calculating that score. I am able to get the anomaly score but facing problem in calculating importance of metrics.
I am able to get the desired result through BigML(online platform) but not through R.
R code:
> library(solitude) # tried 'IsolationForest' and 'h2o' but not getting desired result
> mo = isolation_forest(data)
> final_scores <- predict(mo,data)
> summary(mo)
Length Class Mode
forest 14 ranger list
> head(final_scores,5)
[1] 0.4156554 0.3923926 0.4262782 0.4595296 0.4174865
I want to get the importance values for every metric(a,b,c,d) through R code, just like what I am getting in BigML
I think I am missing out some basic parameters. Actually I am new to R, so am not able to figure it out.
I have thought of something in order to get the feature importance at observation level but I am facing problem in implementing it.
Here is the snippet of what I am planning.
The dots in the metric are individual observations while the lines are splits based on specific variables.
I am able to trace individual trees of forest but the problem is that there are 500 trees in the forest and tracing individual tree and accessing their importance values is impractical. The below example is purely based on dummy data.
Output of individual tree:
> x = treeInfo(mo$forest,tree=3)
> x
nodeID leftChild rightChild splitvarID splitvarName splitval terminal prediction
1 0 1 2 2 c 0.6975663 FALSE NA
2 1 3 4 1 b 0.3455875 FALSE NA
3 2 5 6 0 a 0.2620023 FALSE NA
4 3 7 8 0 a 0.1425075 FALSE NA
5 4 9 10 0 a 0.6611566 FALSE NA
6 5 NA NA NA <NA> NA TRUE 10
7 6 NA NA NA <NA> NA TRUE 2
8 7 NA NA NA <NA> NA TRUE 6
9 8 NA NA NA <NA> NA TRUE 1
10 9 NA NA NA <NA> NA TRUE 3
11 10 NA NA NA <NA> NA TRUE 5
Any kind of help is appreciated.