0

I am writing a function that must divide the whole set into two smaller ones at random. The size of the set is to be determined by the user. I will try to do it this way

number <- function(z,y,p){
indeks <-split(z$y,sample(rep(1:2), c(p, z$y-p)))
train <- z[indeks,]
test <- z [-indeks, ]
result <- list(test, train)
list(result)
}
number(z=lipiec , y=VII,  p=200)

However, the following error pops up

Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE'

The structure of the file that I am trying to divide is int. and there are 574 lines. So the value 200 is not greater than the whole sample. I would like to get two randomly split sets, where one of them (test) will have 200 elements, and the other (train) will be the rest of the base set. Does anyone have any idea what I am doing wrong?

*EDIT After modification I did it as follows:

number <- function(z,y,p){
df <- as.data.frame(z$y)
indeks <-split(df, sample(nrow(df))<=p)
train <- indeks$
test <- indeks$
str(test)}
number(z=lipiec , y=VII,  p=200)

Now I do not know what I should assign to the test and train to assign each of them one of the parts of the collection. Anyone have an idea?

UseR10085
  • 7,120
  • 3
  • 24
  • 54

2 Answers2

2

You can try:

split(df,sample( c(rep(1,200),rep(2,574-200))))
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0
myfun <- function(df, N) {
    split(df, sample(nrow(df))<=N)
}

set.seed(1)
myfun(mtcars,10)
CPak
  • 13,260
  • 3
  • 30
  • 48