A workaround using the purrr
pakcage. It seems like the sample_n
function cannot take n()
as the size argument, probably because that argument does not take vectorized input. However, if we split the data frame by color
as group, we can apply the sample_n
with nrow()
for each group.
# Set seed for reproducibility
set.seed(123)
# Create example data frame
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=10)
# Load packages
library(dplyr)
library(purrr)
outdat <- df %>%
# Split the data frame by color
split(.$color) %>%
# Apply the sample_n function to all data frames
map_dfr(~sample_n(., size = nrow(.), replace = TRUE))
outdat
# X1 X2 color
# 1 1.71506499 -1.12310858 blue
# 2 0.07050839 2.16895597 blue
# 3 0.46091621 -0.40288484 blue
# 4 0.07050839 2.16895597 blue
# 5 0.07050839 2.16895597 blue
# 6 1.71506499 -1.12310858 blue
# 7 -1.26506123 -0.46665535 blue
# 8 1.55870831 -1.26539635 blue
# 9 0.12928774 1.20796200 blue
# 10 1.55870831 -1.26539635 blue
# 11 0.55391765 -0.28477301 pink
# 12 -0.29507148 -2.30916888 pink
# 13 -0.30596266 0.18130348 pink
# 14 -0.06191171 -1.22071771 pink
# 15 0.55391765 -0.28477301 pink
# 16 0.55391765 -0.28477301 pink
# 17 0.87813349 -0.70920076 pink
# 18 0.68864025 1.02557137 pink
# 19 -0.30596266 0.18130348 pink
# 20 0.68864025 1.02557137 pink
# 21 0.70135590 0.12385424 red
# 22 0.11068272 1.36860228 red
# 23 -1.96661716 0.58461375 red
# 24 0.40077145 -0.04287046 red
# 25 1.78691314 1.51647060 red
# 26 -0.55584113 -0.22577099 red
# 27 0.40077145 -0.04287046 red
# 28 1.78691314 1.51647060 red
# 29 -0.47279141 0.21594157 red
# 30 -0.47279141 0.21594157 red
# 31 -1.02600445 -0.33320738 yellow
# 32 -0.72889123 -1.01857538 yellow
# 33 1.25381492 2.05008469 yellow
# 34 0.83778704 0.44820978 yellow
# 35 1.25381492 2.05008469 yellow
# 36 -0.62503927 -1.07179123 yellow
# 37 -0.62503927 -1.07179123 yellow
# 38 0.83778704 0.44820978 yellow
# 39 -0.21797491 -0.50232345 yellow
# 40 -1.68669331 0.30352864 yellow