0

DATA I have a 5-MB, 150k-row dataset that I am trying to analyze using decision trees with R and the rpart package: http://www.mediafire.com/download/3x2b3r9ccj8r1gd/x.csv

QUESTION (Clarified to refer to actual trees described in code) With this dataset, I can grow a tree full with formula credit ~ status + age + state + store whose first split is by state. But using this same dataset, the tree partial with formula credit ~ state does not grow, i.e., has no non-root nodes. Why does partial fail to grow when its (one) independent variable successfully produces the first split in full?

RESEARCH The most relevant Stackoverflow question that I have found is the following, but that question does not explain why a partial tree can fail to grow even when the analogous full tree does: The result of rpart is just with 1 root

CODE

library(data.table)
library(rpart)

x <- fread('x.csv')

full <- rpart(credit ~ status + age + state + store,
                            method = 'class', 
                            data = x,
                            control = rpart.control(minsplit = 250, cp = 0.001))
plot(full, uniform = TRUE, main = 'x')
text(full, use.n = TRUE, all = TRUE, cex = 0.5)
print(full)
printcp(full)

partial <- rpart(credit ~ state,
                       method = 'class', 
                       data = x,
                       control = rpart.control(minsplit = 250, cp = 0.001))
plot(partial, uniform = TRUE, main = 'x')
text(partial, use.n = TRUE, all = TRUE, cex = 0.5)
print(partial)
printcp(partial)
Community
  • 1
  • 1
PDE
  • 119
  • 5
  • "I can grow a tree involving several independent variables --- but not a tree involving just one of those variables." - what does the second part mean? What do you mean that you "cannot grow a tree involving one variable"? – user31264 May 19 '16 at 05:29
  • @user31264: x.tree.credit has a formula credit ~ status + age + state + store; this 'full' tree grows, and its first split is by state. But x.tree.credit.state has a formula credit ~ state; this 'partial' tree does not grow. Why does x.tree.credit.state fail to grow when its independent variable is the variable that produces the first split in x.tree.credit? – PDE May 19 '16 at 11:15
  • what do you mean that is "does not grow"? – user31264 May 19 '16 at 16:15
  • @user31264: (Using variables from main post) `partial` has only one node, i.e., its root, whereas `full` has more than one node. What is mysterious here to me is that `partial` uses the same predictor variable that produces the first split in `full`. I'd therefore expect that `partial` would have at least that first `full` split. But that is not the case. – PDE May 19 '16 at 18:45
  • Maybe it would be helpful to publish (via dput) some small x.csv file which reproduces the error, and the appropriate minsplit for this x.csv. – user31264 May 19 '16 at 21:36

0 Answers0