I am going to simulate a vector with 100 elements in R. The vector only contains numeric values 0, 1 or 2. I only know the sum of the vector. For example, if the sum of the vector is 30, the total numbers of 0 can be 77, the total numbers of 1 can be 16, the total numbers of 2 can be 7. How can I simulate such a vector in R based on the sum of the vector?
Asked
Active
Viewed 72 times
1
-
Can you also have the sum = 77 and numbers such that 0 = 0 , 1 = 77, 2 = 0? – Mitchell Deane Jul 18 '19 at 05:54
-
Are you just hoping to find *one* combination such that the sum is correct? If so, the runtime of such a process is indeterminant .. and though unlikely, it could run for quite a long time before finding one. – r2evans Jul 18 '19 at 05:55
-
1Just curious, is this homework? There isn't a rule against it, but it can shape how we answer. (And having us solve your homework for you completely is truly a detriment to you.) – r2evans Jul 18 '19 at 05:57
-
I don't want to see 0=0 1=77 2=0 if the sum = 77. The only requirement is that the sum is correct while there should be some 1 and 2. I am going to simulate several such vectors which satisfy the condition. The vectors should not have the same pattern. – Mizzle Jul 18 '19 at 05:59
-
It's a homework but the background is minor allele frequency from genotypes. I am given a minor allele frequency from genotypes. I have to simulate a vector satisfy the minor allele frequency. The expected data by simulation looks like this question https://www.researchgate.net/post/How_to_calculate_MAF_from_genotypes_only. – Mizzle Jul 18 '19 at 06:02
-
1Have you tried anything, Mizzle? I'd think this could be done with a straight-forward combination of `while`, `sum`, and `sample`, perhaps with a built-in counter to preclude searching forever. – r2evans Jul 18 '19 at 06:06
-
1I think this question is pretty interesting, and might require a clever and novel solution. Simple solutions based on `sample` will work OK for certain constraints but I think others would be much harder to meet just with naive sampling. – Marius Jul 18 '19 at 06:08
-
I tried but I only know how to generate the vectors with same pattern. For example, if sum = 10 then 1 = 10, 0 = 90, 2 = 0, if sum = 20 then 1 = 20, 0 = 80, 2 = 0. – Mizzle Jul 18 '19 at 06:09
-
Possible duplicate https://stackoverflow.com/questions/53234525/find-all-combinations-of-a-set-of-numbers-that-add-up-to-a-certain-total – Ronak Shah Jul 18 '19 at 06:29
-
I also tried another way. For example, the sum is 30. I generate a random even integer between 0 and 30 from discrete uniform distribution. Suppose the random even integer I got was 16. Then the total number of 1 is 16, the total number of 2 is (30-16)/2 = 7, the rest is 0, i.e. the number of 0 is 100-16-7=77. By sample command, I can randomly reorganize the order of the 100 numbers in a vector. However, I assume discrete uniform distribution when I generate a random even integer between 0 and 30. Thus, the vector whose sum is 30 and the vector whose sum is 50 actually had the same pattern. – Mizzle Jul 18 '19 at 06:29
2 Answers
2
Here is one pretty simple attempt to solve this problem. Instead of sampling all 100 elements, it makes use of the fact that there must be at least 100 - target
zeros. I think there might also be a way to use the fact there can be at most 100 - (target / 2)
zeros (if all the nonzero elements are 2).
sim_freq = function(target, total_size = 100, max_attempts = 100) {
min_zeros = total_size - target
target_found = FALSE
attempts = 0
while (! target_found) {
alleles = sample(0:2, size = target, replace = TRUE)
target_found = sum(alleles) == target
attempts = attempts + 1
if (attempts > max_attempts) {
stop("Couldn't find a match")
}
}
print(paste0("Found a match in ", attempts, " attempts."))
# Shuffle the generated alleles and zeros together
sample(c(alleles, rep(0, min_zeros)))
}
Usage:
sim_freq(26)
sim_freq(77)
In my test runs with targets of 26 and 77, it generally finds a vector that has the desired sum in < 20 attempts, but that might vary a lot for different targets.

Marius
- 58,213
- 16
- 107
- 105
0
Here you have some code to do it, I did it for 15 elements to calculate it faster:
x <- 0:2 #values you desire in the vector
y <- 10 #desired sum of the vector
b <- 0 #inizialize b
#until the sum of the elements is equal to the desired sum
while (b != y) {
a = sample(x,15,replace = TRUE) #calculate a random vector of 15 elements
b = sum(a) #sum of the elements
}
a #desired vector

Ipa
- 109
- 1
- 8