0

I have a 2d array called pmf where each column is a probability mass function. I would like to sample one row index per column according to the respective probability mass function. I currently have

[np.random.choice(range(len(pmf.T[i])), 1, p=pmf.T[i])[0] for i in xrange(len(pmf.T))]

Is there a cleaner numpy way?

As an example:

pmf = [[0.1215122141454291, 0.02526931631702081], 
       [0.44580918821894255, 0.23465429357912862],
       [0.13924078748403307, 0.7311930810111874], 
       [0.2934378101515954, 0.008883309092663008]]

Then the output could be:

[3, 2]
Simd
  • 19,447
  • 42
  • 136
  • 271
  • Please share an input 2d array and what you expect the output to be. – Abdou Dec 06 '16 at 18:51
  • 1
    Does the given sample work on the listed code? – Divakar Dec 06 '16 at 19:02
  • @Divakar Are you asking if the code works? It works for me after my latest edit. – Simd Dec 06 '16 at 19:09
  • It works now after the edit :) – Divakar Dec 06 '16 at 19:10
  • Based on the linked dup target, use : `(pmf.cumsum(0) > np.random.rand(pmf.shape[1])).argmax(0)`. If it doesn't work, let me know. – Divakar Dec 06 '16 at 19:16
  • @Divakar Thank you. Wouldn't searchsorted make more sense than argmax here? That is assuming I understand what you are doing. – Simd Dec 06 '16 at 19:21
  • From my experience, `argmax()` is quite fast. `searchsorted` could be used as well, but that would involve few more steps to have a vectorized solution. One such method would be something like this - http://stackoverflow.com/a/40588862/3293881 – Divakar Dec 06 '16 at 19:23
  • Some nice functionality for this here: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.rv_discrete.html#scipy.stats.rv_discrete – Benjamin Dec 06 '16 at 19:24

0 Answers0