19

I cannot understand what is going wrong here.

data.train <- read.table("Assign2.WineComplete.csv",sep=",",header=T)
# Building decision tree
Train <- data.frame(residual.sugar=data.train$residual.sugar,
                total.sulfur.dioxide=data.train$total.sulfur.dioxide, 
                alcohol=data.train$alcohol,
                quality=data.train$quality)
Pre <- as.formula("pre ~ quality")

fit <- rpart(Pre, method="class",data=Train)

I am getting the following error :

Error in eval(expr, envir, enclos) : object 'pre' not found
zx8754
  • 52,746
  • 12
  • 114
  • 209
Rads
  • 345
  • 1
  • 4
  • 11
  • 2
    You don't need the second or third lines of your code. Just do the `read.table` line then do: `fit <- rpart(pre ~ quality, method="class",data=data.train)`. – Thomas Oct 19 '13 at 06:45
  • I tried what you asked me to do, but I still get the same error – Rads Oct 19 '13 at 07:12
  • Is there a upper/lower case problem here? I see 'Pre' declared but error is about 'pre'. – ako Oct 19 '13 at 07:31
  • No, if instead of all the statements, i just write data.train <- read.table("Assign2.WineComplete.csv",sep=",",header=T) and then fit <- rpart(pre ~ quality, method="class",data=data.train) , i get the same error – Rads Oct 19 '13 at 07:36

4 Answers4

23

Don't know why @Janos deleted his answer, but it's correct: your data frame Train doesn't have a column named pre. When you pass a formula and a data frame to a model-fitting function, the names in the formula have to refer to columns in the data frame. Your Train has columns called residual.sugar, total.sulfur, alcohol and quality. You need to change either your formula or your data frame so they're consistent with each other.

And just to clarify: Pre is an object containing a formula. That formula contains a reference to the variable pre. It's the latter that has to be consistent with the data frame.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • 1
    Well.. I mistakenly deleted his post while trying to edit my comment.. sorry about that @Janos.. i get what you say.. but when building a decision tree using rpart, can you please tell me how the formula should be, the decision tree has to be made only the column "quality". I tried to use the example in R :: fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis) – Rads Oct 19 '13 at 08:39
  • To use `rpart`, you need a dependent variable: something that is to be predicted or estimated using the independent variables. Which of those in your data frame is the dependent? – Hong Ooi Oct 19 '13 at 08:46
13

This can happen if you don't attach your dataset.

zx8754
  • 52,746
  • 12
  • 114
  • 209
2

I think I got what I was looking for..

data.train <- read.table("Assign2.WineComplete.csv",sep=",",header=T)
fit <- rpart(quality ~ ., method="class",data=data.train)
plot(fit)
text(fit, use.n=TRUE)
summary(fit)
Rads
  • 345
  • 1
  • 4
  • 11
0

i use colname(train) = paste("A", colname(train)) and it turns out to the same problem as yours.

I finally figure out that randomForest is more stingy than rpart, it can't recognize the colname with space, comma or other specific punctuation.

paste function will prepend "A" and " " as seperator with each colname. so we need to avert the space and use this sentence instead:

colname(train) = paste("A", colname(train), sep = "")

this will prepend string without space.