1

I have as input two jagged arrays: The first is an array of arrays of values and the second is an array of arrays of probabilities corresponding to the values of the former array. For each row of values I would like to draw one value according to the corresponding array of probabilities.

Below is an example code of a non-vectorized version:

values = [[1, 2, 4], [19, 8], [7, 6, 1, 2], [5, 0]]
probabilities = [[0.1, 0.1, 0.8], [0.5, 0.5], [1, 0, 0, 0], [0.25, 0.75]]

output = [np.random.choice(values, p=probs) for values, probs in zip(values, probabilities)]
print(output)

Output:

[4, 19, 7, 0]

I tried using np.vectorize for each pair of values and probabilities but it provided no speedup. Is there a way to vectorize this type of random choice, either by using np.vectorize or not?

  • 2
    `np.vectorize` is just a python `for` loop in disguise (the docs say it's provided for convenience). I'm not sure it's possible to have actual numpy vectorized speeds for this but I would be happy to be proven incorrect. – roganjosh Aug 19 '18 at 12:11
  • I can't test atm but my attempt would be to pad the sample and probability arrays with 0 values to get a square matrix and see if `np.random.choice` will accept 2D arrays – roganjosh Aug 19 '18 at 12:15
  • Those are lists, not arrays. – juanpa.arrivillaga Aug 19 '18 at 13:21
  • If those were numpy arrays of numpy arrays would np.vectorize do any speedup? I have seen [here](https://stackoverflow.com/questions/34502254/vectorizing-haversine-distance-calculation-in-python) an example where np.vectorize does speed up things. – Yinon Douchan Aug 19 '18 at 13:39
  • In your link the `np.vectorized` solution is used with pandas `group_by` and `apply`. So it isn't a good indication of relative speed of `vectorize` itself. Note that @Divakar's solution is much faster. – hpaulj Aug 19 '18 at 15:38
  • `choice` is inherently a 1d operation. It is compiled, and complains if the inputs are not 1d and the `p` don't sum to 1. So it has to be called once for each pair of values and probabilities. The iteration mechanism can't improve on that. – hpaulj Aug 19 '18 at 15:58

0 Answers0