-3

I have created multiple dataframes based on various conditions. Now I would like to sample the different dataframes but I would like to remove the lines once they are sampled. I have tried dplyrs sample_n:

sample_n(df, 4)

the problem is that this does not remove the lines, would I need some recursive loop that would remove the lines once they are sampled? Or is there some handy function that can help me?

Lowpar
  • 897
  • 10
  • 31
  • Please provide a small reproducible example and expected output – akrun Jan 17 '17 at 08:40
  • 1
    Have a look at the `modelr` package for the tidyverse approach. – Axeman Jan 17 '17 at 09:06
  • @akrun the same question was not asked. I did not merely want to sample the dataframe, I need to not sample the same data again once I sample subsequent times. – Lowpar Jan 17 '17 at 09:53
  • I think it could be categorized as a general dupe as was done [here](http://stackoverflow.com/questions/41689941/moving-variablecolumns-to-column-name-vertical-to-horizontal-in-r/41689954#41689954). Anyway, I am reopening it if you find it objectionable – akrun Jan 17 '17 at 09:56
  • @akrun, I think it was my fault was not elaborating on the title correctly. – Lowpar Jan 17 '17 at 10:01

2 Answers2

6

Works for me.

# generate data
a <- data.frame(letters = letters[1:5], var = rnorm(5))
b <- data.frame(letters = letters[6:10], var = rnorm(5))
c <- data.frame(letters = letters[11:15], var = rnorm(5))
xy <- list(a, b, c)

set.seed(357) # set seed for reproducibility
dfsample <- sample(seq_len(length(xy)), 1) # sample out one data.frame

xy[[dfsample]]

  letters         var
1       a  1.51348192
2       b -0.60657737
3       c  0.51828252
4       d -0.05352487
5       e -1.34303266

# remove random row, notice the minus sign in front of the sample
xy[[dfsample]] <- xy[[dfsample]][-sample(1:nrow(xy[[dfsample]]), 1), ]
xy[[dfsample]]

  letters         var
2       b -0.60657737
3       c  0.51828252
4       d -0.05352487
5       e -1.34303266
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
0
modelr::crossv_mc(mtcars, 5, 0.5)

creates 5 sets of exclusive splits of equal size. They are stored as list columns, and use the resample class, which is memory efficient.

# A tibble: 5 × 3
           train           test   .id
          <list>         <list> <chr>
1 <S3: resample> <S3: resample>     1
2 <S3: resample> <S3: resample>     2
3 <S3: resample> <S3: resample>     3
4 <S3: resample> <S3: resample>     4
5 <S3: resample> <S3: resample>     5
Axeman
  • 32,068
  • 8
  • 81
  • 94