0

I am simulating flipping 999 coins 1000 times, and draw a distribution of sample mean, which may take a long time (about 21 seconds). Is there a better way to do this? a faster way to run for loop, for instance. will vectorizing be useful?

import datetime
import numpy as np

sample_mean_dis = []
start_time = datetime.datetime.now()
# to draw a distribution of sample mean
for i in range(1000):
    if not (i%100):
        print('iterate: ', i)
    sums_1000coins = []
    # simulate 1k repetition of experiment_1
    # and consider this opertation as a sample
    # and compute the sample mean
    for i in range(1000):
        # this is simulating experiment_1 which flip 999 coins
        # and sum heads
        coins = np.random.randint(2, size=999)
        sums_1000coins.append(np.sum(1 == coins))
    sample_mean_dis.append(np.mean(sums_1000coins))
end_time = datetime.datetime.now()
elapsedTime = end_time - start_time
print("Elapsed time: %d seconds" % (elapsedTime.total_seconds()))
Prune
  • 76,765
  • 14
  • 60
  • 81
  • Isn't 1000 x 1000 a million? – Peter Wood Feb 22 '19 at 22:47
  • 3
    The major slow-down is that you're flipping 999 coins 1000 times, and doing 1000 trials of *that*. The total is 999 *million* flips. – Prune Feb 22 '19 at 23:02
  • Yes, you are right. So, is there a better way to do this? like vectorizing or parallel computing? any inspiration would be appreciated. –  Feb 22 '19 at 23:07
  • What computer specs are you using? Running the program in parallel would speed it up. But even that may be too long based on @Prune's point about how many iterations you are running. I used to run simulations in my lab that would take a day to complete due to the complexities, sometimes you just gotta deal with it. – Edeki Okoh Feb 22 '19 at 23:15
  • @EdekiOkoh would you please give a hint how to do my trials in parallel? just split 10 pieces with 100 trials of each, and put them in multiprocessing? –  Feb 22 '19 at 23:36
  • 1
    @brennn [Start here](https://www.journaldev.com/15631/python-multiprocessing-example) and work your way through the process. Multiprocessing using python isn't like normal packages since it is dependent on your hardware also. I would read up on it first before deciding implement it. But to point out you are doing 999,000,000 flips, so it may take a bit. Im sure what every distribution you want to get from the flips can be seen without that many trials. – Edeki Okoh Feb 22 '19 at 23:38

1 Answers1

1

To flip 999 coins and see which come up heads, read 999 bits of random data (a bit can either be 0 or 1 with probability 50/50, just like a coin) and then count how many bits are set to 1.

import random
bin(random.getrandbits(999)).count("1")

the above will probably return a number close to 499.5

To flip 999 coins 1000 times do the above in a for loop:

num_heads = [bin(random.getrandbits(999)).count("1") for _ in range(1000)]

num_heads will be a list of 1000 integers normally distributed around 499.5 (999/2).

Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
  • Sorry, I didn't read your code carefully or Prune's comment. You're not doing what you said in the title. But this is twice as fast as your code. – Boris Verkhovskiy Feb 23 '19 at 00:07
  • Thanks for your code, it helps a lot. And what do you mean by saying 'You're not doing what you said in the title'? Are you indicating my code is NOT simulating a distribution of sample mean? plz give me a hint, thanks in advance. –  Feb 23 '19 at 02:42
  • 1
    He mean that you said that you are simulating the toss of 999 coins 1000 times, but that is not what your code does. Your code is doing that a thousand times and getting their means. – Poshi Feb 23 '19 at 09:04