This can be done using the setdiff()
function.
Edit: Please note that there is another answer by @AlexR using negative indexing which is much simpler if the indices are only used for subsetting.
However, first we need to create some dummy data as ther OP hasn't provided any data with the question (For future use, please read How to make a great R reproducible example?):
Dummy data
Create dummy data frame with 2158 rows and two columns:
n <- 2158
Gary <- data.frame(V1 = seq_len(n), V2 = sample(LETTERS, n , replace =TRUE))
str(Gary)
#'data.frame': 2158 obs. of 2 variables:
# $ V1: int 1 2 3 4 5 6 7 8 9 10 ...
# $ V2: Factor w/ 26 levels "A","B","C","D",..: 21 11 24 10 5 17 18 1 25 7 ...
Sampled and leftover rows
First, the vectors of sampled and leftover rows are computed, before subsetting Gary
in subsequent steps:
set.seed(22)
sampled_rows <- sample(seq_len(nrow(Gary)), 1529, replace=FALSE)
leftover_rows <- setdiff(seq_len(nrow(Gary)), selected_rows)
train <- Gary[sampled_rows, ]
leftover <- Gary[leftover_rows, ]
str(train)
#'data.frame': 1529 obs. of 2 variables:
# $ V1: int 657 1025 2143 1123 1817 1558 1324 1590 898 801 ...
# $ V2: Factor w/ 26 levels "A","B","C","D",..: 19 16 25 15 2 5 8 14 20 3 ...
str(leftover)
#'data.frame': 629 obs. of 2 variables:
# $ V1: int 2 5 6 7 8 9 10 12 20 24 ...
# $ V2: Factor w/ 26 levels "A","B","C","D",..: 11 5 17 18 1 25 7 25 7 18 ...
leftover
is a data frame which contains the rows of Gary
which haven't been sampled.
Verification
To verify, we combine train
and leftover
again and sort the rows to compare with the original data frame:
recombined <- rbind(train, leftover)
identical(Gary, recombined[order(recombined$V1), ])
#[1] TRUE