I am using R to classify a data-frame called 'd' containing data structured like below:
The data has 576666 rows and the column "classLabel" has a factor of 3 levels: ONE, TWO, THREE.
I am making a decision tree using rpart:
fitTree = rpart(d$classLabel ~ d$tripduration + d$from_station_id + d$gender + d$birthday)
And I want to predict the values for the "classLabel" for newdata
:
newdata = data.frame( tripduration=c(345,244,543,311),
from_station_id=c(60,28,100,56),
gender=c("Male","Female","Male","Male"),
birthday=c(1972,1955,1964,1967) )
p <- predict(fitTree, newdata)
I expect my result to be a matrix of 4 rows each with a probability of the three possible values for "classLabel" of newdata
. But what I get as the result in p, is a dataframe of 576666 rows like below:
I also get the following warning when running the predict
function:
Warning message:
'newdata' had 4 rows but variables found have 576666 rows
Where am I doing wrong?!