How to split and dataset into train and test and merge their corresponding "class" in R

Question

I am using the wisconsin dataset which has two categorical columns IDs and class. In order to carry out classification I must drop these two columns from the dataframe and then split the dataset into train and test (80%:20%). I have this done but now I want to merge the corresponding class to the split datasets. Then I have to put the split classes into a new vector.

example:

data <- read.csv("data.csv")
data <-data[,-1] #drop IDs
data <-data[,-10] #drop class
X <-data.frame((scale(data)))
dt = sort(sample(nrow(X), nrow(X)*8))
training <-X[dt,]
test<-X[-dt,]

From here I need to merge the class corresponding to the sample.

Welcome to stack overflow! There's a great reference on how to ask a question using a reproducible example here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It may help others help you if you can provide one. — De Novo, Mar 10 '18 at 12:26
You are dropping class and ID. What information should be used to match values from `training` or `test` back to `data`? — Roman Luštrik, Mar 10 '18 at 12:36
I will be using the class values to link them back to the data. I have to put the class corresponding to the split data into a vector. This is the issue I am having though because I also have to scale my data which alters the class number. — Naomi Breslin, Mar 10 '18 at 13:03

score 0 · Answer 1 · answered Mar 10 '18 at 15:28

I would do it something like this:

# read data
data <- read.csv("data.csv")

# split the data
X <-data.frame((scale(data[,-c(1,10)])))
dt = sort(sample(nrow(X), nrow(X)*8))
training <-X[dt,]
test<-X[-dt,]

# add columns
training <- cbind(training, data[dt, c(1,10)])
test <- cbind(test, data[-dt, c(1, 10)])

How to split and dataset into train and test and merge their corresponding "class" in R

1 Answers1