I think your best bet is to make a list of data frames rather than your approach using a for loop. We can do this using replicate()
which uses lapply()
.
First, let's create a dummy data frame df
that mimics your data, with 5 columns and 24,347 observations:
df<-data.frame(a = rnorm(24347),
b = rnorm(24347),
c = rnorm(24347),
d = rnorm(24347),
e = rnorm(24347))
Next, set the number of iterations you want, and how big each subset sample should be:
iterations <- 10
subset_size <- 100
Finally, create a list of sampled data frames:
samples_list = replicate(n = iterations,
expr = {df[sample(nrow(df), subset_size),]},
simplify = F)
This repeats the expression df[sample(nrow(df), subset_size),]
for however many iterations you desire and places each newly created data frame in the list samples_list
.
You access the data frames just like you would access any other list element:
samples_list[[1]]
Just remember the double brackets around your data frame element, or else it will not work. From here, you can access any particular row or column as normal:
samples_list[[dataframe]][row,column]
If you need more info on lists
, I would head over to this post: https://stackoverflow.com/a/24376207/6535514