I have an inventory dataframe that is like:
set.seed(5)
library(data.table)
#replicated data
invntry <- data.table(
warehouse <- sample(c("NY", "NJ"), 1000, replace = T),
intid <- c(rep(1,150), rep(2,100), rep(3,210), rep(4,50), rep(5,80), rep(6,70), rep(7,140), rep(8,90), rep(9,90), rep(10,20)),
placement <- c(1:150, 1:100, 1:210, 1:50, 1:80, 1:70, 1:140, 1:90, 1:90, 1:20),
container <- sample(1:100,1000, replace = T),
inventory <- c(rep(3242,150), rep(9076,100), rep(5876,210), rep(9572,50), rep(3369,80), rep(4845,70), rep(8643,140), rep(4567,90), rep(7658,90), rep(1211,20)),
stock <- c(rep(3200,150), rep(10000,100), rep(6656,210), rep(9871,50), rep(3443,80), rep(5321,70), rep(8659,140), rep(4567,90), rep(7650,90), rep(1298,20)),
risk <- runif(100)
)
setnames(invntry, c("warehouse", "intid", "placement", "container", "inventory", "stock", "risk"))
invntry[ , ticket := 1:.N, by=c("intid", "warehouse")]
invntry$ticket[invntry$warehouse=="NJ"] <- 0
#ensuring some same brands are same container
invntry$container[27:32] <- 6
invntry$container[790:810] <- 71
invntry[790:820,]
There's more variables in the actual data that I want to use to compare the same items itid
that are in different container
s. So I would like to conduct multiple trials for a given range of sample sizes n for each item, such that I keep randomly selecting an item until I have n items from different containers, but keeping the duplicates if they've already been selected. So for a sample size of 6 for item 8, it might take 7 tries to get a sample size of 6:
warehouse intid placement container inventory stock risk ticket
21: NY 8 10 71 4567 4567 0.38404806 5
22: NY 8 11 96 4567 4567 0.64665968 6
23: NJ 8 12 15 4567 4567 0.68265602 0
24: NY 8 13 19 4567 4567 0.84437586 7
21: NY 8 10 71 4567 4567 0.38404806 5
26: NY 8 15 34 4567 4567 0.69580270 8
28: NY 8 17 78 4567 4567 0.25352370 9
I tried searching on this site, but couldn't find for the above and something to accommodate wanting to compute some values for each trial and sample size from the trial's rows' columns so I think I have to use a for
loop so that I can distinguish each trial for each sample size. To summarize, two goals:
conduct random sampling of each
itid
n unique containers are selected cumulatively keeping theitid
s already selectedbe able to do calculations on variables for each trial for each sample size for each item
Any ideas?
*doesn't have to involve data.table
, that's just how it got started
(I think it's essentially the basic probability example of continuing to draw marbles from the urn until you have a sample size of all different colors-but even realizing that didn't help me find a solution!)