DATA I have a 5-MB, 150k-row dataset that I am trying to analyze using decision trees with R and the rpart package: http://www.mediafire.com/download/3x2b3r9ccj8r1gd/x.csv
QUESTION (Clarified to refer to actual trees described in code) With this dataset, I can grow a tree full
with formula credit ~ status + age + state + store
whose first split is by state. But using this same dataset, the tree partial
with formula credit ~ state
does not grow, i.e., has no non-root nodes. Why does partial
fail to grow when its (one) independent variable successfully produces the first split in full
?
RESEARCH The most relevant Stackoverflow question that I have found is the following, but that question does not explain why a partial tree can fail to grow even when the analogous full tree does: The result of rpart is just with 1 root
CODE
library(data.table)
library(rpart)
x <- fread('x.csv')
full <- rpart(credit ~ status + age + state + store,
method = 'class',
data = x,
control = rpart.control(minsplit = 250, cp = 0.001))
plot(full, uniform = TRUE, main = 'x')
text(full, use.n = TRUE, all = TRUE, cex = 0.5)
print(full)
printcp(full)
partial <- rpart(credit ~ state,
method = 'class',
data = x,
control = rpart.control(minsplit = 250, cp = 0.001))
plot(partial, uniform = TRUE, main = 'x')
text(partial, use.n = TRUE, all = TRUE, cex = 0.5)
print(partial)
printcp(partial)