How to split data into training and validation in R?

Question

The question says:

Load the data and split it into 75% training and 25% validation data using set.seed(4650).

this is what I have:

setwd("C:/Users/Downloads")
cat = read.csv("cat.csv")
set.seed(4650)
train = sample(c(TRUE, TRUE, TRUE, FALSE), nrow(cat), rep = TRUE)
validation = (!train)

And I need to provide summary of the training data.

summary(train)

which gives me

Mode       FALSE   TRUE
logical    830     2463

Am I splitting the data in the right way?

Thank you very much.

score 6 · Accepted Answer · answered Oct 15 '17 at 23:57

This is how data splitting is done in Max Kuhn's book on the caret package.

library(caret)
set.seed(4650)
trainIndex <- createDataPartition(iris$Species, 
                                  p = .75, 
                                  list = FALSE, 
                                  times = 1)

irisTrain <- iris[ trainIndex,]
irisTest  <- iris[-trainIndex,]

score 4 · Answer 2 · answered Oct 15 '17 at 23:58

4

Here's what you can do.

#Example Data
df <- iris

n_train <- round(nrow(iris) * 0.75)

train <- sample(1:nrow(iris), n_train, replace = FALSE)
test <- (1:nrow(iris))[-train]

train_df <- df[train, ]
test_df <- df[test, ] # same as df[-train, ]

summary(train_df)

answered Oct 15 '17 at 23:58

kangaroo_cliff

6,067
3
29
42

I want to develop auto.arima model from multiple time series data and I want to use 1 year of data, 3 year of data, 5, 7... in a two year interval from each series and testing it in the testing set. How do I do the subsetting so that the fitted model will have what I want? I appreciate for your help – Stackuser Apr 09 '20 at 04:10

How to split data into training and validation in R?

2 Answers2