J48 tree in R - train and test classification

Question

I want to use train and test in J48 decision-tree on R. here is my code:

library("RWeka")

data <- read.csv("try.csv")
resultJ48 <- J48(classificationTry~., data)

summary(resultJ48)

but I want to split my data into 70% train and 30% test, how can I use the J48 algo to do it?

many thanks!

How about data sampling without replacement (see `?sample`) – R Yoda May 08 '16 at 08:02 — R Yoda, May 08 '16 at 08:02

knb · Accepted Answer · 2016-05-08T20:52:20.813

4

use the sample.split() function of the caTools package. It is more leightweight than the caret package (which is a meta package if I remember correctly):

library(caTools)

library(RWeka)

data <- read.csv("try.csv")
spl = sample.split(data$someAttribute, SplitRatio = 0.7)

dataTrain = subset(data, spl==TRUE)
dataTest = subset(data, spl==FALSE)

resultJ48 <- J48(as.factor(classAttribute)~., dataTrain) 
dataTest.pred <- predict(resultJ48, newdata = dataTest)
table(dataTest$classAttribute, dataTest.pred)

edited May 08 '16 at 20:52

answered May 08 '16 at 16:24

knb

9,138
4
58
85

what is the meaning of dataTest.pred? You just put all the info together create pred in the dataTest to save the result to it? – moshem May 08 '16 at 17:07
also, how I can see the summary of the results this way? – moshem May 08 '16 at 17:42
1

use `summary(resultJ48)` to get the "weka-specific" output "Correctly Classified Instances...." --- dataTest.pred holds the output of your trained classifier, applied on the 30% training data. I thought that you probably want to use the output of J48, on your test data. So I just wrote something what seemed natural to me. What you actually wanted to do I can't infer, because you just wrote something very generic in your question. `table()` just compares testdata and predicted testdata attribute-values. SImple confusion matrix. – knb May 08 '16 at 20:55

score 1 · Answer 2 · answered May 08 '16 at 00:53

1

You may want to check the createDataPartition in caret package.

answered May 08 '16 at 00:53

Psidom

209,562
33
339
356

score 1 · Answer 3 · answered May 08 '16 at 07:57

It is not in R. But in java... But you will understand the logic with it.

int trainSize = (int) Math.round(trainingSet.numInstances() * 0.7); //70% split 
int testSize = trainingSet.numInstances() - trainSize;
Instances train = new Instances(trainingSet, 0, trainSize);
Instances test = new Instances(trainingSet, trainSize, testSize)

Implement in R with same logic. Hope it helps :)

score 0 · Answer 4 · answered Jul 22 '18 at 09:37

If you don't want to use more packages other than RWeka, you can do it with runif:

library("RWeka")
data <- read.csv("try.csv")

randoms=runif(nrow(data))

resultJ48 <- J48(classificationTry~., data[randoms<=0.7,])
PredTest <- predict(resultJ48, newdata = data[randoms>0.7,])
table(data[randoms>0.7,]$classificationTry, PredTest)

J48 tree in R - train and test classification

4 Answers4