2

(Very) amateur coder and statistician working on a problem in R.

I have four integer lists: A, B, C, D.

A <- [1:133]
B <- [1:266]
C <- [1:266]
D <- [1:133, 267-400]

I want R to generate all of the permutations from picking 1 item from each of these lists (I know this code will take forever to run), and then take the mean of each of those permutations. So, for instance, [1, 100, 200, 400] -> 175.25.

Ideally what I would have at the end is a list of all of these means then.

Any ideas?

  • Do you really need all of them? If you want a sample of say, only 1000, or a million, or 10 million, of them, it's pretty trivial. But all 2.5 billion is probably going to run you into some memory issues; though you could do them in batches writing every 50 million to a `txt` file. – Gregor Thomas Dec 13 '12 at 00:13
  • Also, do you only need the means at the end, or do you also want the particular values of A, B, C, D, that were used to produce that instance of the mean? – Gregor Thomas Dec 13 '12 at 00:20
  • 2
    Surely there are more simple ways of answering the question either with sampling or with theory. E[A+B] = E[A] + E[B] ... and all that jazz. You say did say "statistician", right? – IRTFM Dec 13 '12 at 00:41
  • Yet another approach: if you decide on an ordering of the permutations it would be easy to write a function returning the values of A, B, C, and D for the `i`'th permutation, and thus calculate the `i`'th mean. – Gregor Thomas Dec 13 '12 at 07:50

2 Answers2

1

Here's how I'd do this for a smaller but similar problem:

A <- 1:13
B <- 1:26
C <- 1:26
D <- c(1:13, 27:40)

mymat <- expand.grid(A, B, C, D)
names(mymat) <- c("A", "B", "C", "D")
mymat <- as.matrix(mymat)
mymeans <- rowSums(mymat)/4

You'll probably crash R if you just up all the indices, but you could probably set up a loop, something like this (not tested):

B <- 1:266
C <- 1:266
D <- c(1:133, 267:400)

for(A in 1:133) {
    mymat <- expand.grid(A, B, C, D)
    names(mymat) <- c("A", "B", "C", "D")
    mymat <- as.matrix(mymat)
    mymeans <- rowSums(mymat)/4
    write.table(mymat, file = paste("matrix", A, "txt", sep = "."))
    write.table(mymeans, file = paste("means", A, "txt", sep = "."))
    rm(mymat, mymeans)
}

to get them all. That still might be too big, in which case you could do a nested loop, or loop over D (since it's the biggest)

Alternatively,

n <- 1e7
A <- sample(133, size = n, replace= TRUE)
B <- sample(266, size = n, replace= TRUE)
C <- sample(266, size = n, replace= TRUE)
D <- sample(x = c(1:133, 267:400), size = n, replace= TRUE)
mymeans <- (A+B+C+D)/4

will give you a large sample of the means and take no time at all.

hist(mymeans)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
1

Even creating a vector of means as large as your permutations will use up all of your memory. You will have to split this into smaller problems, look up writing objects to excel and then removing objects from memory here (both on SO).

As for the code to do this, I've tried to keep it as simple as possible so that it's easy to 'grow' your knowledge:

#this is how to create vectors of sequential integers integers in R
a <- c(1:33)
b <- c(1:33)
c <- c(1:33)
d <- c(1:33,267:300)

#this is how to create an empty vector
means <- rep(NA,length(a)*length(b)*length(c)*length(d))
#set up for a loop
i <- 1

#how you run a loop to perform this operation
for(j in 1:length(a)){
    for(k in 1:length(b)){
        for(l in 1:length(c)){
            for(m in 1:length(d)){
                y <- c(a[j],b[k],c[l],d[m])
                means[i] <- mean(y)
                i <- i+1
            }
        }
    }
}

#and to graph your output
hist(means, col='brown')
#lets put a mean line through the histogram
abline(v=mean(means), col='white', lwd=2) 
Community
  • 1
  • 1
lilster
  • 921
  • 5
  • 14
  • Your `seq()` calls are a bit off- to get all integers from 1 to 33, you just use `1:33`, without the `seq`, and to get `1:33` then `267:300` in the same vector, it's just `c(1:33, 267:300)`, again without `seq`. Yours does something a bit different. – Marius Dec 13 '12 at 01:02