Vectorize numpy code with operation depending on previous value

Question

The following code models a system that can sample 3 different states at any time, and the constant transition probability between those states is given by the matrix prob_nor. Threrefore, each point in trace depends on the previous state.

n_states, n_frames = 3, 1000
state_val = np.linspace(0, 1, n_states)

prob = np.random.randint(1, 10, size=(n_states,)*2)
prob[np.diag_indices(n_states)] += 50

prob_nor = prob/prob.sum(1)[:,None] # transition probability matrix, 
                                    # row sum normalized to 1.0

state_idx = range(n_states) # states is a list of integers 0, 1, 2...
current_state = np.random.choice(state_idx)

trace = []      
sigma = 0.1     
for _ in range(n_frames):
    trace.append(np.random.normal(loc=state_val[current_state], scale=sigma))
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])

The loop in the above code makes it run pretty slow, specially when I have to model millions of data points. Is there any way to vectorize/accelerate it?

'vectorize' in the strictest `numpy` sense means operating on whole arrays in compiled code. It moves the iterations to the compiled level, outside of the control of your Python code. So an inherently sequential, iterative, problem can't be 'vectorized'. Calling those `np.random` functions repeatedly for one value at a time is much slower than calling them once for many values. — hpaulj, Oct 08 '18 at 16:36
Someone just recently asked why the Python `random.random` functions were faster than the `np.random` ones. They are faster when used for one value at a time. — hpaulj, Oct 08 '18 at 16:41
@hpaulj I think you are referring to https://stackoverflow.com/a/50790263/8033585 — AGN Gazer, Oct 08 '18 at 17:22
*"... I have to model millions of data points"* What are typical values of `n_states` and `n_frames` for the problems that you are interested in? — Warren Weckesser, Oct 08 '18 at 17:32
@WarrenWeckesser `n_states` is roughly 2-10, but occasionally the transition probability matrix (`prob_nor`) is sparse and in that case `n_states` is 10-100. `n_frames` 1e3-1e6. `trace` has to be generated 1000s times — Brenlla, Oct 08 '18 at 17:45
@Brenlla I've provided an edit to my answer which resolves your issue. If `n_states` is 100, and `n_frames` is 1e6, my asnwer will use just shy of 1Gb of ram, and be significantly faster than putting your computations in a loop. — PMende, Oct 08 '18 at 17:48

PMende · Accepted Answer · 2018-10-08T18:21:32.967

3

Offload the computation of probabilities as soon as possible:

possible_paths = np.vstack(
    np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
    for curr_state in range(n_states)
)

Then you can simply do a lookup to follow your path:

path_trace = [None]*n_frames
for step in range(n_frames):
    path_trace[step] = possible_paths[current_state, step]
    current_state = possible_paths[current_state, step]

Once you have your path, you can compute your trace:

sigma = 0.1
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)

Comparing timings:

Pure python for loop

%%timeit
trace_list = []
current_state = np.random.choice(state_idx)
for _ in range(n_frames):
    trace_list.append(np.random.normal(loc=state_val[current_state], scale=sigma))
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])

Results:

30.1 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Vectorized lookup:

%%timeit
current_state = np.random.choice(state_idx)
path_trace = [None]*n_frames
possible_paths = np.vstack(
    np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
    for curr_state in range(n_states)
)
for step in range(n_frames):
    path_trace[step] = possible_paths[current_state, step]
    current_state = possible_paths[current_state, step]
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)

Results:

641 µs ± 6.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

A speedup of approximately 50x.

edited Oct 08 '18 at 18:21

answered Oct 08 '18 at 15:59

PMende

5,171
2
19
26

1

This doesn't work. `current_state ` gives the probabilities for the choice, and it changes each time. I suspect this is a Markov Chain, and this solution won't give the proper transition probability. – Matthieu Brucher Oct 08 '18 at 16:24
The transition matrix here is random, but I suspect it's not the case in the actual code. – Matthieu Brucher Oct 08 '18 at 16:25
@MatthieuBrucher Yes, it is a Markov chain. No, the code in this answer does not work – Brenlla Oct 08 '18 at 16:57
1

@Brenlla I've updated my answer to vectorize as much as possible while still doing what you need it to. This should be significantly faster than your previous approach, but still requires using a `for` loop. Though be aware this will use memory on the order of `n_frames*n_states`. – PMende Oct 08 '18 at 17:17
1

You have to create `possible_paths` each time a new result is computed, so the time to generate `possible_paths` should be included in the total time for your method. (It might still be significantly faster than the original--I haven't tried it.) – Warren Weckesser Oct 08 '18 at 17:49
@WarrenWeckesser Quite right. It makes a small difference (about a factor of 1.5). I've editted it to include that. – PMende Oct 08 '18 at 17:52
@pmende, you can't precompute the paths. Each time, you will get another set of possible paths with different probabilities. Your approach doesn't work because you don't understand what the algorithm is doing. – Matthieu Brucher Oct 08 '18 at 17:54
The fact that the paths change for each similar state is the principle of a Markov Chain. The randomness is mandatory each time you generate a new path. – Matthieu Brucher Oct 08 '18 at 17:56
@MatthieuBrucher I'm a physicist. I understand Markov chains and transition probabilities. You simply don't understand what the computation that I've shown is doing. – PMende Oct 08 '18 at 17:56
"Your approach doesn't work because you don't understand what the algorithm is doing." The addition of "simply" to that wouldn't make it more judgmental. Anyway, the process is stochastic. It doesn't matter if I compute the transition probabilities of going from `0 -> 0, 0 -> 1, 0 ->2 ...` now or when I'm thinking about the particle actually going through the transition. It's the same. – PMende Oct 08 '18 at 18:03
The OP asked for vectorization/acceleration, not general scalability. There are ways to memory-optimize this process for larger chains if need be in exchange for reduced speed. – PMende Oct 08 '18 at 18:04
3

This method works. For each state `j` and each "time" `k`, `possible_paths[j, k]` holds a randomly generated next state. This value was precomputed, using the appropriate row from `prob_nor`. It has precomputed more than is strictly necessary to generate a path, but it does it using numpy's vectorized code, so it is much faster than the repeated calls of the original code. In a comment, Brenlla gave the expected ranges of problem parameters, which should be enough information to decide if this answer is a viable solution. – Warren Weckesser Oct 08 '18 at 18:09
You'll note that you can go look at the edit history and see that the code is pretty much the same at this point as when you commented "you can't precompute the paths. Each time, you will get another set of possible paths with different probabilities. Your approach doesn't work because you don't understand what the algorithm is doing." This is also evident by the fact that there are earlier comments describing portions of the code as it currently exists. – PMende Oct 08 '18 at 18:58

Stuart · Answer 2 · 2018-10-08T18:38:24.477

Maybe I'm missing something, but I think you can create the current_states as a list and then vectorise the remaining steps:

# Make list of states (slow part)
states = []
current_state = np.random.choice(state_idx)
for _ in range(n_frames):
    states.append(current_state)
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])

# Vectorised part
state_vals = state_val[states]   # alternatively np.array(states) / (n_states - 1)
trace = np.random.normal(loc=states, scale=sigma)

I believe this method works and will lead to a modest speed improvement while using some extra memory (3 lists/arrays are created instead of one). @PMende's solution leads to much larger speed improvement.

Vectorize numpy code with operation depending on previous value

2 Answers2

Linked