-2

Is there a way to assign vector elements to multiple subarrays in R, using sample() or split() (or a combination of both functions)?

Essentially what I need is a function that randomly assigns values to multiple subarrays

Here's my full specific code:

K <- 2 # number of subarrays

N <- 100

Hstar <- 10

perms <- 10000

probs <- rep(1/Hstar, Hstar)

K1 <- c(1:5)
K2 <- c(6:10)

specs <- 1:N

pop <- array(dim = c(c(perms, N), K))

haps <- as.character(1:Hstar)


for(j in 1:perms){
    for(i in 1:K){ 
        if(i == 1){
            pop[j, specs, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
    } else{
            pop[j,, 1] <- sample(haps[K1], size = N, replace = TRUE, prob = probs[K1])
            pop[j,, 2] <- sample(haps[K2], size = N, replace = TRUE, prob = probs[K2])
        }
    }
}

pop[j,, 1] is the first subarray in pop, while pop[j,, 2] is the second subarray in pop

If I have 20 subarrays, using sample() 20 times is tedious. I just want a way to assign values to the any number of subarrays quickly and easily.

Any ideas?

compbiostats
  • 909
  • 7
  • 22
  • 1
    Try `sample`. It will give you randomly determined subsets, and you can tell it the size of the subset you want. – lebelinoz Oct 13 '17 at 20:56
  • 3
    I'm confused - are you asking how to create `sub1`, `sub2`, ... in a less cumbersome way? Or are you asking how to sample from `sub1 `and `sub2`. The *"two equally-sized subarrays"* is confusing because in your example code `sub1` and `sub2` are different sizes, and the "sub" name suggests "subarray". Are sample sizes to be random? Or you want equal sized samples from arrays with different sizes? With or without replacement? – Gregor Thomas Oct 13 '17 at 21:02
  • 1
    @Frank Whoops ... I should have saw that. Changed it now. – compbiostats Oct 13 '17 at 21:02
  • @Gregor I'm trying be as minimalist as possible. I will post my full code. – compbiostats Oct 13 '17 at 21:03
  • Please, don't post your full code! Rather, just be a little more descriptive. Maybe *augment* the existing example, but keeping it minimal is great! – Gregor Thomas Oct 13 '17 at 21:07
  • My reading of your problem now is (with questions noted): you have an input vector, `x`. You want to populate `n` equal-sized "sub-arrays" with the elements of `x`. Q: Is this a partition? That is, can `x[1]` occur in multiple sub-arrays? Is each sub-array sampling with replacement? That is, can `x[1]` occur multiple times in the same subarray? – Gregor Thomas Oct 13 '17 at 21:10
  • If you want a random partition, think about the size of each (say five and five) and then jumble a vector of assignments to groups `g = sample(rep(1:2, c(5, 5)))` then you can split the vector like `split(x, g)`. A somewhat more general example here: https://stackoverflow.com/a/36069362/ – Frank Oct 13 '17 at 21:11
  • Other parts I find confusing: *"Now say I have two equally-sized subarrays..."* How are these subarrays different from `sub1` and `sub2`? Why are we talking about them? Maybe just delete this paragraph. *"with any given number of rows and columns"*, we just have vectors right now, can we ignore "rows and columns"? If you need this to work on matrices, and rows and columns have meaning, maybe make `x` a 3x4 matrix instead of a vector. – Gregor Thomas Oct 13 '17 at 21:13
  • what's the difference in length between the original vector and the sum of subvector lengths? – 3pitt Oct 13 '17 at 21:14
  • @Gregor Yes identical values can occur in multiple subarrays. It is a partition. In my code, values are not repeated among subarrays however – compbiostats Oct 13 '17 at 21:19

1 Answers1

0

It depends whether you want replacement (the possibility of duplicate/omitted elements). Regardless, it's a one liner

sample(x,length(x),replace=FALSE)

Not 100% clear on the whole multiple subarray thing, but my approach would be something like:

num.intervals<-5
interval.size<-length(x)/5 #need to make sure this is evenly divisible I suppose
arr.master<-rep(NA,0)
for (i in 1:num.intervals){
    arr.master<-rbind(arr.mater,sample(x,interval.size,replace=TRUE)
}

Basically, just take samples and keep mashing them together? Would this accomplish your goal?

Do you want to have the sum of num_elements of all subarrays equal to num_elements in the original array? If so, then it's just a random sorting problem (really easy) and then cut it up after into any number of subarrays. If not, then you could fix the number of elems in all subarrays in advance; randomly sample from original a new vector of this size; then partition it into arbitrary subarrays.

3pitt
  • 899
  • 13
  • 21
  • This sounds promising. Your suggestion to break up an array into any number of subarrays after populating the larger array with values sounds much simpler than how I've been doing it (essentially the opposite way). – compbiostats Oct 13 '17 at 21:47