0

As title says, I have a txt file containing a y-variable and 4 x-variables:

  playtennis  outlook temperature humidity   wind
1         no    sunny         hot     high   weak
2         no    sunny         hot     high strong
3        yes overcast         hot     high   weak
4        yes     rain        mild     high   weak
5        yes     rain        cool   normal   weak
6         no     rain        cool   normal strong

The goal is to predict the y-variable (playtennis) with a classification tree.

So I decided to make a training set of course:

SamSize <- floor(0.25*nrow(input.dat))
train_ind <- sample(seq_len(nrow(input.dat)), size = SamSize)
train <- input.dat[train_ind, ]
test <- input.dat[-train_ind, ]

And then to use rpart to create the classification tree:

tree1 = rpart(playtennis ~ outlook  + temperature   + humidity  + wind, data = test, subset = train, method = "class",cp=0.001,xval=20)

But I'm getting the error:

Error in `[.default`(xj, i) : invalid subscript type 'list'

I can't figure out what is wrong.

Do I need to convert my data.frame table into something else? I tried

as.matrix(train)
as.matrix(test)

and it did not solve the problem (I was thinking maybe it is not able to recognise the input).

Thank you for your suggestions!

Edit: Here is the dput() file, in case it is helpful solving this issue.

structure(list(playtennis = structure(c(1L, 1L, 3L, 3L, 3L, 1L, 
3L, 1L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("no", "No", "yes"
), class = "factor"), outlook = structure(c(4L, 4L, 1L, 2L, 2L, 
2L, 1L, 4L, 4L, 2L, 4L, 1L, 1L, 3L), .Label = c("overcast", "rain", 
"Rain", "sunny"), class = "factor"), temperature = structure(c(2L, 
2L, 2L, 3L, 1L, 1L, 3L, 3L, 1L, 3L, 3L, 2L, 1L, 4L), .Label = c("cool", 
"hot", "mild", "Mild"), class = "factor"), humidity = structure(c(1L, 
1L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("high", 
"High", "normal"), class = "factor"), wind = structure(c(2L, 
1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L), .Label = c("strong", 
"weak"), class = "factor")
  • It's unclear what you wish to do with the `subset` argument. This is one way to use it: `tree1 <- rpart(playtennis ~ outlook + temperature + humidity + wind, data = input.dat, subset = train_ind, method = "class", cp=0.001, xval=20)`. – Weihuang Wong Nov 14 '16 at 00:08
  • Please use `dput()` to share your data so that we can reproduce your problem – Hack-R Nov 14 '16 at 01:43
  • @Hack-R, I've included the dput(), hopefully it can be useful. I think that I must be lacking some step where I transform the data frame into some format that rpart() can more easily interpret. – kalgarianer Nov 14 '16 at 08:39
  • @WeihuangWong, Your suggestion worked for rpart(), but I cannot display the tree with plot() or text(), as it says "the fit is not a tree, just a root". This is why I'm thinking now that the problem lies in how the data frame is being read by rpart()... – kalgarianer Nov 14 '16 at 08:41
  • 1
    Your `dput()` output is incomplete. (Please check that your example is reproducible.) That said, the problem is probably **not** with how your dataframe is being read. Perhaps try the suggestion at http://stackoverflow.com/a/20994978/6455166, in particular the `minsplit` and `minbucket` arguments. – Weihuang Wong Nov 15 '16 at 00:34

0 Answers0