Outcome of a simulated dice and coin toss in R

Question

The experiment involves rolling a fair die and getting x say, then tossing a fair coin x number of times and recording the number of tails. I need to do this experiment 50 times and record the outcomes in a vector, (which I'll then use to plot a histogram.)

This is my code so far:

    for (i in 1:100)
    {X <- sample(6,1,replace=TRUE,c(1,1,1,1,1,1)/6)
    Y <- sample(2,1,replace=TRUE,c(1,1)/2)}
    Youtcomes <- c(sum(Y))
    Youtcomes

But instead of giving me a vector with 100 elements, I keep getting just a single number. Where am I going wrong?

Note: I have to use a for loop.

You're overwriting `X` and `Y` in each iteration of your loop... — Joshua Ulrich, Mar 06 '13 at 22:18
Your loop assigns a single value to X and Y each time, and overwrites. There is no connection between X and Y as your description wants. You then take the sum of a single value to give Youtcomes. The result of sum should be a single vector — mnel, Mar 06 '13 at 22:20
Sorry, I just realized I misread your assignment. You need to role a die 50 times, and then flip a coin however much the score was on the die, right? — TARehman, Mar 06 '13 at 22:27
Regarding your note: whoever set this question is teaching you to do stupid things in R. This is not C or any other non-vectorised language. In R we tend not to iterate over loops when vectorised solutions exist. If your question is "How do I do this via a loop?" then it is a silly R question, **too localized** to be of much use to anyone at a later date and I will vote to close. — Gavin Simpson, Mar 06 '13 at 22:42
@GavinSimpson Well, given how many solutions are being given showing that it can & SHOULD be done without using a loop...maybe it would be worth keeping open as an example? — TARehman, Mar 06 '13 at 22:44
@TARehman I see two done properly without loops - mine and @mnel's. The `lapply()` is still a loop and generates 100 calls to `sample()` Those answers are poor R code for *this* question. — Gavin Simpson, Mar 06 '13 at 22:55

score 6 · Answer 1 · answered Mar 06 '13 at 22:38

6

Use the fact that R is vectorized. You can then use a binomial distribution to replicate the coin toss.

heads <- rbinom(size = sample(6,100, replace = TRUE), n=100, prob = 0.5)
sum(heads)

answered Mar 06 '13 at 22:38

mnel

113,303
27
265
254

1

I believe the user has to use a `for()` loop for their assignment. – TARehman Mar 06 '13 at 22:39
3

@TaRehman -- That would be inefficient here. SO is not a help desk for assignments. I'm answering the question is `what am I doing wrong` in relation to the description. I'd suggest the `for` is included here. – mnel Mar 06 '13 at 22:41
Oh, no doubt. I just think it's a homework question where he/she needs to use a loop. – TARehman Mar 06 '13 at 22:42

score 6 · Answer 2 · answered Mar 06 '13 at 22:40

6

Perhaps I have missed something, but what is wrong with one call to sample() to do the 100 rolls of the dice, and then plug that into rbinom() to do the coin tosses? We pass the output from sample() to the size argument

> set.seed(1)
> rbinom(100, size = sample(6, 100, replace = TRUE), prob = 0.5)
  [1] 1 1 1 6 1 2 2 2 3 1 2 1 2 1 1 0 3 1 1 3 6 1 2 0 2 1 1 1 2 2 2 1 0 1 4 3 3
 [38] 1 5 2 3 2 2 1 3 2 0 2 1 4 2 3 1 1 1 0 1 1 1 1 2 2 1 2 3 1 0 2 1 2 2 4 2 1
 [75] 1 5 3 2 3 5 1 2 3 1 4 0 3 1 2 1 1 0 1 5 2 3 0 2 2 3

answered Mar 06 '13 at 22:40

Gavin Simpson

170,508
25
396
453

Because I need to use a for loop as that's what we're learning about in R at the moment. – Mathlete Mar 06 '13 at 22:43
4

Then you are being taught badly! This is not how to answer the question. I can think of many reasons to use a `for()` loop and do so regularly as I do not possess the rabid fear of `for()` that some of the `apply`-loving R coders exhibit. However, this is **not** a situation that calls for `for()`. [so] is *not* your private "help me write crappy R code Helpdesk" and if that is your question you'll be getting a close vote from me. [so] is not about you but about curating *the* best programming resource. Please focus your questions accordingly. – Gavin Simpson Mar 06 '13 at 22:48
Well that's not something I can control. I didn't know it was 'crappy' code, I've only been using R for a week. I'm more than open to learning about making my codes more efficient and I've actually learnt a lot from some of the answers posted here. – Mathlete Mar 06 '13 at 22:52
1

The rant of @GavinSimpson was directed more at your teacher I think... – Paul Hiemstra Mar 07 '13 at 09:44

Arun · Accepted Answer · 2013-03-07T10:00:02.290

4

Discalimer: (very inefficient solution see mnel/Gavin's solution)

As you can read the many, many, .. MANY comments underneath each of the answers, while this answer attempts to answer OP's specific question(however inefficient his requirements maybe), in the spirit of maintaining decorum of the forum, some have (rightly) pointed out that the question is in bad taste and my answer doesn't do justice to the forum requirements. I accept all criticism and leave the answer here only for obvious reasons (marked as answer, continuity). I suggest you look at mnel/Gavin's answer for a vectorised solution to this specific problem. If you're interested in looking at an implementation of for-loop, then refer to the bottom of this post, but I suggest you look at it to know the structure of for-loop, but not implement a for-loop to this specific problem. Thank you.

Your code is riddled with quite a few problems, apart from the main problem @Joshua already mentioned:

First, you rewrite every time the values of X and Y inside the loop so, at the end of the loop, there is only the last value of Y that is being summed up.

Second, your code for Y is not correct. You say, you have to get x amount of coin tosses, Yet, you use sample(2, 1, ...). The 1 must be replaced with X which equals the number from the die roll.

Try out this code instead:

Youtcomes <- sapply(1:100, function(x) {
    X <- sample(1:6, 1, replace=TRUE, rep(1,6)/6)
    Y <- sample(c("H", "T"), X, replace=TRUE, rep(1,2)/2)
    sum(Y == "T")
})

Here, we loop over 100 times, and each time, sample values between 1 and 6 and store in X. Then, we sample either head (H) or tail (T) X number of times and store in Y.

Now, sum(Y == "T") gives the sum for current value of x (1 <= x <= 100). So, at the end, Youtcomes will be your set of simulated Y == Tail values.

Then, you can do a hist(Youtcomes).

Edit: If its a for-loop solution that's desired then,

# always assign the variable you'll index inside for-loop
# else the object will keep growing every time and a copy of 
# entire object is made for every i, which makes it extremely 
# slow/inefficient.
Youtcomes <- rep(0, 100)
for (i in 1:100) {
    X <- sample(1:6, 1, replace=TRUE, rep(1,6)/6)
    Y <- sample(c("H", "T"), X, replace=TRUE, rep(1,2)/2)
    # assign output inside the loop with [i] indexing
    Youtcomes[i] <- sum(Y == "T")
    # since Youtcomes is assigned a 100 values of 0's before
    # the values will replace 0' at each i. Thus the object 
    # is not copied every time. This is faster/efficient.
}

edited Mar 07 '13 at 10:00

answered Mar 06 '13 at 22:27

Arun

116,683
26
284
387

I see what you mean. That's really helpful thanks. Can I ask what the sapply function does? I have googled it but I'm a little confused. – Mathlete Mar 06 '13 at 22:30
2

This "sample(6, 1) will always give only one value = 6" is incorrect. I just tried it and got `3`. `sample()` has multiple ways to interpret its first argument `x` and a positive integer indicating the sample takes place from `1:x`. – Gavin Simpson Mar 06 '13 at 22:30
@GavinSimpson, I stand corrected. I'll edit. – Arun Mar 06 '13 at 22:31
@Mathlete sapply() applies a specific function to each element of a vector. I'll show a bit more in my answer. – TARehman Mar 06 '13 at 22:31
I have to use a for loop unfortunately. How would I change sapply for this? – Mathlete Mar 06 '13 at 22:32
@Mathlete, the edit has the loop solution. – Arun Mar 06 '13 at 22:37
Thanks, that's great. I understand where I was going wrong now. – Mathlete Mar 06 '13 at 22:43
1

@Mathlete You do realise that this results in 200 calls to `sample()` where 1 call plus one call to `rbinom` would suffice. It is stupid R code - you are asking us to abuse our beloved friend and it is wicked of you or your "teacher" to make us do so. – Gavin Simpson Mar 06 '13 at 22:44
@GavinSimpson Slightly harsh, no? I'm just trying to get through my degree and this is one of the tasks I've been given. I thought SE was a forum for help. – Mathlete Mar 06 '13 at 22:49
2

@Mathlete Nope it is not *just* a help forum. It aspires to be more than that, hence the "too localized" close vote option. Questions should be relevant to more than just your or a specific point in time. They should help people learn to do things correctly and the Q&A becomes an enduring resource of high quality answers. You'll get short shrift here if you insist on inefficient, badly implemented solutions just so you can follow an instructors requirements. – Gavin Simpson Mar 06 '13 at 22:54
you can use `replicate(100,...)` or `plyr::raply(100,...)` instead of `sapply(1:100,...)` – Ben Bolker Mar 06 '13 at 22:55
-1 Sorry but I feel compelled to downvote for suggesting inefficient code generating hundreds of function calls just because the OP wants to do it that way. – Gavin Simpson Mar 06 '13 at 22:57
@GavinSimpson, I disagree with the way you see it, but it is not an issue. – Arun Mar 06 '13 at 23:04
@GavinSimpson I never said it was JUST a help forum. – Mathlete Mar 06 '13 at 23:09
@Arun thank you for taking the time to help me with my code. I really appreciate it. – Mathlete Mar 06 '13 at 23:09
1

- -1 :) your edit is sufficient a health warning for me to remove my downvote. Buyer beware as they say! – Gavin Simpson Mar 07 '13 at 16:13

Simon O'Hanlon · Answer 4 · 2013-03-06T22:40:35.757

1

Arun beat me to it. But another of the many many ways could be (if I understand your desired outcome correctly..

X <- sample(6,100,replace=TRUE,c(1,1,1,1,1,1)/6)
Y <- lapply(X , function(x){ res <- sample( c( "H" , "T" ) , x , replace=TRUE , c(1,1)/2 ) ; table( res ) } )

You want to histogram the results....

res <- unlist(Y)
hist( res[names( res )=="T"] )

edited Mar 06 '13 at 22:40

answered Mar 06 '13 at 22:32

Simon O'Hanlon

58,647
14
142
184

I don't think there's a need for the last parameter in either `sample()` because the vector of weights defaults to uniform weighting, which is what is being used here. – Simon Mar 06 '13 at 22:35
I need to use a for loop though. – Mathlete Mar 06 '13 at 22:36
@Mathlete Why? R is designed for vectorised operations. Why do you specifically need to use a loop? – Simon O'Hanlon Mar 06 '13 at 22:37
@Simon v. true! Cheers, Simon :-) – Simon O'Hanlon Mar 06 '13 at 22:37
@SimonO101 Because that's what we're doing in R at the moment. – Mathlete Mar 06 '13 at 22:39
@Mathlete that reasoning kinda makes me want to tear my hair out (what little of it is left!). I guess you should give Arun a green tick as he has written you a nice loop. – Simon O'Hanlon Mar 06 '13 at 22:41
Well `lapply()` is just hiding the loop... – Gavin Simpson Mar 06 '13 at 22:45
@GavinSimpson yes, true. Is there any difference in efficiency of the implementation though? – Simon O'Hanlon Mar 06 '13 at 22:46
@SimonO101 I'm not saying I'm not interested in learning how to make this code more efficient. It's just that I have been instructed to use a for loop for this exercise and I thought I could ask for help in doing so, without being judged for it. I guess not. – Mathlete Mar 06 '13 at 22:50
I guarantee that the answer from @mnel or my own will be far quicker than any `lapply()` or `for()` based solution. Just count the number of function calls. Mnel and I have 2 calls, yours has 101 just to do the sampling if I ignore the `table()` call. – Gavin Simpson Mar 06 '13 at 22:51
@GavinSimpson I'm not disagreeing with you or making any comment as to the most efficient algo! It wasn't what I was asking about. I was simply asking the question in relation to `for` and `lapply`! – Simon O'Hanlon Mar 06 '13 at 22:54
@Mathlete It's not a question of being judged. You asked... "how do I do this". Not "how do I do this with a for loop". People aren't trying to judge you, they're trying to help you. I hope you will not be put off using SO as its a great resource. – Simon O'Hanlon Mar 06 '13 at 22:58
2

http://stackoverflow.com/questions/6460827/what-are-the-advantages-of-the-apply-functions-when-are-they-better-to-use-th/6461438#6461438 – Ben Bolker Mar 06 '13 at 22:58
@BenBolker thanks. Any by extension, thanks Gavin for the linked answer! – Simon O'Hanlon Mar 06 '13 at 23:00
@GavinSimpson Thanks Gavin. And I am currently enjoying your more detailed explanation in the question Ben linked to. Cheers. – Simon O'Hanlon Mar 06 '13 at 23:01
-1 Sorry but I feel compelled to downvote for suggesting inefficient code generating hundreds of function calls just because the OP wants to do it that way. – Gavin Simpson Mar 06 '13 at 23:01
@GavinSimpson if it wasn't so trivial with such a short runtime I might have thought about 'hundreds of functions calls' as an issue and addressed this. – Simon O'Hanlon Mar 06 '13 at 23:12
@SimonO101 I really appreciate all the help that I've received on this question. As a total R newbie, I feel slightly attacked hearing my code is 'crappy,' I'm sorry I can't go from zero to amazing in the space of a week, when it comes to my coding skills. – Mathlete Mar 06 '13 at 23:14
@Mathlete For the record, I only took umbridge at the fact you kept telling contributors that you *had* to do it with a `for` loop. I also didn't call your code crappy - that was a flippant summary of what you expecting from [so]. – Gavin Simpson Mar 07 '13 at 01:35

Outcome of a simulated dice and coin toss in R

4 Answers4

Discalimer: (very inefficient solution see mnel/Gavin's solution)

Linked