As title says, I have a txt file containing a y-variable and 4 x-variables:
playtennis outlook temperature humidity wind
1 no sunny hot high weak
2 no sunny hot high strong
3 yes overcast hot high weak
4 yes rain mild high weak
5 yes rain cool normal weak
6 no rain cool normal strong
The goal is to predict the y-variable (playtennis) with a classification tree.
So I decided to make a training set of course:
SamSize <- floor(0.25*nrow(input.dat))
train_ind <- sample(seq_len(nrow(input.dat)), size = SamSize)
train <- input.dat[train_ind, ]
test <- input.dat[-train_ind, ]
And then to use rpart to create the classification tree:
tree1 = rpart(playtennis ~ outlook + temperature + humidity + wind, data = test, subset = train, method = "class",cp=0.001,xval=20)
But I'm getting the error:
Error in `[.default`(xj, i) : invalid subscript type 'list'
I can't figure out what is wrong.
Do I need to convert my data.frame table into something else? I tried
as.matrix(train)
as.matrix(test)
and it did not solve the problem (I was thinking maybe it is not able to recognise the input).
Thank you for your suggestions!
Edit: Here is the dput() file, in case it is helpful solving this issue.
structure(list(playtennis = structure(c(1L, 1L, 3L, 3L, 3L, 1L,
3L, 1L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("no", "No", "yes"
), class = "factor"), outlook = structure(c(4L, 4L, 1L, 2L, 2L,
2L, 1L, 4L, 4L, 2L, 4L, 1L, 1L, 3L), .Label = c("overcast", "rain",
"Rain", "sunny"), class = "factor"), temperature = structure(c(2L,
2L, 2L, 3L, 1L, 1L, 3L, 3L, 1L, 3L, 3L, 2L, 1L, 4L), .Label = c("cool",
"hot", "mild", "Mild"), class = "factor"), humidity = structure(c(1L,
1L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("high",
"High", "normal"), class = "factor"), wind = structure(c(2L,
1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L), .Label = c("strong",
"weak"), class = "factor")