0

I am using the wisconsin dataset which has two categorical columns IDs and class. In order to carry out classification I must drop these two columns from the dataframe and then split the dataset into train and test (80%:20%). I have this done but now I want to merge the corresponding class to the split datasets. Then I have to put the split classes into a new vector.

example:

data <- read.csv("data.csv")
data <-data[,-1] #drop IDs
data <-data[,-10] #drop class
X <-data.frame((scale(data)))
dt = sort(sample(nrow(X), nrow(X)*8))
training <-X[dt,]
test<-X[-dt,]

From here I need to merge the class corresponding to the sample.

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • Welcome to stack overflow! There's a great reference on how to ask a question using a reproducible example here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It may help others help you if you can provide one. – De Novo Mar 10 '18 at 12:26
  • You are dropping class and ID. What information should be used to match values from `training` or `test` back to `data`? – Roman Luštrik Mar 10 '18 at 12:36
  • I will be using the class values to link them back to the data. I have to put the class corresponding to the split data into a vector. This is the issue I am having though because I also have to scale my data which alters the class number. – Naomi Breslin Mar 10 '18 at 13:03

1 Answers1

0

I would do it something like this:

# read data
data <- read.csv("data.csv")

# split the data
X <-data.frame((scale(data[,-c(1,10)])))
dt = sort(sample(nrow(X), nrow(X)*8))
training <-X[dt,]
test<-X[-dt,]

# add columns
training <- cbind(training, data[dt, c(1,10)])
test <- cbind(test, data[-dt, c(1, 10)])
YOLO
  • 20,181
  • 5
  • 20
  • 40