tl;dr: it is the special kind of decision tree called Isolation Tree (iTree) in the original paper:
We show in this paper that a tree structure can be constructed effectively to isolate every single instance. [...] This isolation characteristic of tree forms the basis of our method to detect anomalies, and we call this tree Isolation Tree or iTree.
The proposed method, called Isolation Forest or iForest, builds an ensemble of iTrees for a given data set [...]
All ensemble methods (to which Isolation Forest belongs) consist of base estimators (i.e. they are exactly ensembles of base estimators); from the sklearn guide:
The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator.
For example, in Random Forest (which arguably was the inspiration for the name Isolation Forest), this base estimator is a simple decision tree:
n_estimators : int, default=100
The number of trees in the forest.
Similarly for algorithms like Gradient Boosting Trees (despite the scikit-learn docs referring to them as "boosting stages", they are decision trees nevertheless), Extra Trees etc.
In all these algorithms, the base estimator is fixed (although its specific parameters can vary as set in the ensemble arguments). There is another category of ensemble methods, where the exact model to be used as base estimator can be also set by a respective argument base_estimator
; for example, here is the Bagging Classifier:
base_estimator : object, default=None
The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a decision tree.
and AdaBoost:
base_estimator : object, default=None
The base estimator from which the boosted ensemble is built. [...] If None, then the base estimator is DecisionTreeClassifier(max_depth=1)
.
Historically speaking, the first ensembles were built using various versions of decision trees, and arguably still today it is decision trees (or variants, like iTrees) that are almost exclusively used for such ensembles; quoting from another answer of mine in Execution time of AdaBoost with SVM base classifier :
Adaboost (and similar ensemble methods) were conceived using decision trees as base classifiers (more specifically, decision stumps, i.e. DTs with a depth of only 1); there is good reason why still today, if you don't specify explicitly the base_classifier
argument, it assumes a value of DecisionTreeClassifier(max_depth=1)
. DTs are suitable for such ensembling because they are essentially unstable classifiers, which is not the case with SVMs, hence the latter are not expected to offer much when used as base classifiers.