I don't have access to the original texts 1,2 but using some secondary sources, key differences between these recursive ("greedy") partitioning ("tree") algorithms seem to be:
Type of learning:
- ID3, as an "Iterative Dichotomiser," is for binary classification only
- CART, or "Classification And Regression Trees," is a family of algorithms (including, but not limited to, binary classification tree learning). With
rpart()
, you can specify method='class'
or method='anova'
, but rpart
can infer this from the type of dependent variable (i.e., factor or numeric).
Loss functions used for split selection.
- ID3, as other comments have mentioned, selects its splits based on Information Gain, which is the reduction in entropy between the parent node and (weighted sum of) children nodes.
- CART, when used for classification, selects its splits to achieve the subsets that minimize Gini impurity
Anecdotally, as a practitioner, I hardly ever hear the term ID3 used, whereas CART is often used as a catch-all term for decision trees. CART has a very popular implementation in R's rpart
package. ?rpart
notes that "In most details it follows Breiman et. al (1984) quite closely."
However, you can pass rpart(..., parms=list(split='information'))
to override the default behavior and split on information gain instead.
1 Quinlan, J. R. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81–106
2 Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software.