0

I have everything working when I run the chunks but an error occurs when I decide to knit my .rmd file


########### needed for testing purpose #################

library(tree)
set.seed(77191)


library(ISLR)
library(randomForest)
attach(Carseats)
n=nrow(Carseats)
indices=sample(1:n,n/2,replace=F)
cstrain=Carseats[indices,]
cstest=Carseats[-indices,]

tree.cs <- tree(Sales ~. , data = cstrain)
summary(tree.cs)
plot(tree.cs)
text(tree.cs)
y_hat <-predict(tree.cs, newdata = cstest)
test.mse =mean((y_hat - cstest$Sales)^2)  #Test's MSE
test.mse
######################################################

# 2nd chunk
cv.cs <- cv.tree(tree.cs)
cx =cv.cs$size
cy =cv.cs$dev
mymy <- xy.coords(cx,cy)
plot(mymy, xlab = "size", ylab = "dev",  type = "b")
mini.tree <-which.min(cv.cs$dev)
points(mini.tree,cv.cs$dev[mini.tree], col="green", cex= 2, pch = 20)

2nd chunk Yields :![enter image description here


#3rd chunk
#pruning
prune.cs <- prune.tree(tree.cs, best = mini.tree)
plot(prune.cs) # the problematic part

y_hat <- predict(prune.cs, newdata = cstest)

mean((y_hat - cstest$Sales)^2)

The 3rd chunk has to yield something similar to this: enter image description here Not a duplicate of:

'x' is a list, but does not have components 'x' and 'y'

Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' is a list, but does not have components 'x' and 'y'

Did not solve the problem:

Fit a Decision Tree classifier to the data; Error in code

I know about the coordinates plot() needs in order to run but here I am trying to plot a tree. Also, it worked many times before but wouldn't just knit the file.

1st chuck is added in case you want to try it by yourself.

Thank you.

1 Answers1

2

I suppose your problematic line should be

prune.cs <- prune.tree(tree.cs, best = cv.cs$size[mini.tree])

instead of

prune.cs <- prune.tree(tree.cs, best = mini.tree)

You are not interested in the index, which can change every time you do cross-validation, but the tree size at that index.

The same thing is true in the 2nd chunk where you have

points(mini.tree,cv.cs$dev[mini.tree], col="green", cex= 2, pch = 20)

which should be

points(cv.cs$size[mini.tree], cv.cs$dev[mini.tree], col="green", cex= 2, pch = 20)
user12728748
  • 8,106
  • 2
  • 9
  • 14
  • could you expand just a little on the tree size at index to nail down the conceptual difference. Thx. – Chris Apr 02 '20 at 04:41
  • 1
    `which.min(cv.cs$dev)` gives you the index at which cv.cs$dev is minimal. `cv.cs$size[mini.tree]` gives you the number of terminal nodes where cv.cs$dev is minimal, which is what you want to pass on as `best`. You want to prune your tree to that size, not to 1, if cv.cs$dev[1] is minimal. – user12728748 Apr 02 '20 at 13:07