how to make this simulating of a distribution of sample mean faster

Question

I am simulating flipping 999 coins 1000 times, and draw a distribution of sample mean, which may take a long time (about 21 seconds). Is there a better way to do this? a faster way to run for loop, for instance. will vectorizing be useful?

import datetime
import numpy as np

sample_mean_dis = []
start_time = datetime.datetime.now()
# to draw a distribution of sample mean
for i in range(1000):
    if not (i%100):
        print('iterate: ', i)
    sums_1000coins = []
    # simulate 1k repetition of experiment_1
    # and consider this opertation as a sample
    # and compute the sample mean
    for i in range(1000):
        # this is simulating experiment_1 which flip 999 coins
        # and sum heads
        coins = np.random.randint(2, size=999)
        sums_1000coins.append(np.sum(1 == coins))
    sample_mean_dis.append(np.mean(sums_1000coins))
end_time = datetime.datetime.now()
elapsedTime = end_time - start_time
print("Elapsed time: %d seconds" % (elapsedTime.total_seconds()))

The major slow-down is that you're flipping 999 coins 1000 times, and doing 1000 trials of *that*. The total is 999 *million* flips. — Prune, Feb 22 '19 at 23:02
Yes, you are right. So, is there a better way to do this? like vectorizing or parallel computing? any inspiration would be appreciated. — , Feb 22 '19 at 23:07
What computer specs are you using? Running the program in parallel would speed it up. But even that may be too long based on @Prune's point about how many iterations you are running. I used to run simulations in my lab that would take a day to complete due to the complexities, sometimes you just gotta deal with it. — Edeki Okoh, Feb 22 '19 at 23:15
@EdekiOkoh would you please give a hint how to do my trials in parallel? just split 10 pieces with 100 trials of each, and put them in multiprocessing? — , Feb 22 '19 at 23:36
@brennn [Start here](https://www.journaldev.com/15631/python-multiprocessing-example) and work your way through the process. Multiprocessing using python isn't like normal packages since it is dependent on your hardware also. I would read up on it first before deciding implement it. But to point out you are doing 999,000,000 flips, so it may take a bit. Im sure what every distribution you want to get from the flips can be seen without that many trials. — Edeki Okoh, Feb 22 '19 at 23:38

Boris Verkhovskiy · Accepted Answer · 2019-02-23T00:09:51.160

1

To flip 999 coins and see which come up heads, read 999 bits of random data (a bit can either be 0 or 1 with probability 50/50, just like a coin) and then count how many bits are set to 1.

import random
bin(random.getrandbits(999)).count("1")

the above will probably return a number close to 499.5

To flip 999 coins 1000 times do the above in a for loop:

num_heads = [bin(random.getrandbits(999)).count("1") for _ in range(1000)]

num_heads will be a list of 1000 integers normally distributed around 499.5 (999/2).

edited Feb 23 '19 at 00:09

answered Feb 22 '19 at 23:39

Boris Verkhovskiy

14,854
11
100
103

Sorry, I didn't read your code carefully or Prune's comment. You're not doing what you said in the title. But this is twice as fast as your code. – Boris Verkhovskiy Feb 23 '19 at 00:07
Thanks for your code, it helps a lot. And what do you mean by saying 'You're not doing what you said in the title'? Are you indicating my code is NOT simulating a distribution of sample mean? plz give me a hint, thanks in advance. – Feb 23 '19 at 02:42
1

He mean that you said that you are simulating the toss of 999 coins 1000 times, but that is not what your code does. Your code is doing that a thousand times and getting their means. – Poshi Feb 23 '19 at 09:04

how to make this simulating of a distribution of sample mean faster

1 Answers1