I have seen many variations on the "split vector X into Y chunks in R" question on here. See for example: here and here for just two. So, when I realized I needed to split a vector into Y chunks of random size, I was surprised to find that the randomness requirement might be "new"--I couldn't find a way to do this on here.
So, here's what I've drawn up:
k.chunks = function(seq.size, n.chunks) {
break.pts = sample(1:seq.size, n.chunks, replace=F) %>% sort() #Get a set of break points chosen from along the length of the vector without replacement so no duplicate selections.
groups = rep(NA, seq.size) #Set up the empty output vector.
groups[1:break.pts[1]] = 1 #Set the first set of group affiliations because it has a unique start point of 1.
for (i in 2:(n.chunks)) { #For all other chunks...
groups[break.pts[i-1]:break.pts[i]] = i #Set the respective group affiliations
}
groups[break.pts[n.chunks]:seq.size] = n.chunks #Set the last group affiliation because it has a unique endpoint of seq.size.
return(groups)
}
My question is: Is this inelegant or inefficient somehow? It will get called 1000s of times in the code I plan to do, so efficiency is important to me. It'd be especially nice to avoid the for
loop or having to set both the first and last groups "manually." My other question: Are there logical inputs that could break this? I recognize that n.chunks
cannot > seq.size
, so I mean other than that.