0

I have the following problem, which is a re-designed homework problem I am looking to put inside an upcoming tutorial.

The core idea is something like this.

I have 800,000 coins to be distributed amongst 900,000 slots. The brute-forced python-loop simulation would look like this:

import random
for coin in coins: #assume we have a list of coins
    slot = random.choice(slots) #assume we have a list of slots, which are sets.
    slot.add(coin)

Now, if we're doing some small number of coins in some small number of slots, that's trivially fast. But with hundreds of thousands of coins and slots, that's when some speed optimization might be really nice. Might there be a way to do this using numpy?

FWIW I've seen the solutions presented in How to create a list of random integer vector whose sum is x and Generate multiple random numbers to equal a value in python, but none of them are vectorized.

Community
  • 1
  • 1
ericmjl
  • 13,541
  • 12
  • 51
  • 80

1 Answers1

3

What about this:

import numpy as np
coins = 800000
slots = 900000

coin_positions = np.random.randint(slots, size=coins)

coins_per_slot = np.histogram(coin_positions, bins=np.arange(slots + 1))[0]

If there are much fewer slots than coins, you can use a binomial distribution to calculate a random number of coins for each slot: to distribute n coins among k slots, generate x ~ B(n, 1/k). Then the kth slots contains x coins, and you go on distributing n-x coins among k-1 slots:

for i in range(k, 0, -1):
    x = np.random.binomial(n, 1./i)
    coins_per_slot[i-1] = x
    n -= x
Lynn
  • 10,425
  • 43
  • 75
  • 2
    [`np.bincount`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html) is probably a better option than `np.histogram` for this task. – Jaime Apr 11 '15 at 16:28