0

In a three-dimensional numpy array, I attempt to calculate the lengths of one's sequences.

It is said that I have a binary three-dimensional matrix, which is a sparse matrix with values that are usually zero and occasionally one.

The X and Y axes represent spatial coordinates, while the Z axis represents the time dimension.

If the value at the matrix's entry is 1, it indicates that the coordinate is "work" at that time. If the dot is not set to "work," the value that will be present is 0.

I try to get a list of the lengths of all activity point sequences.


Basic setup for this situation is:

import numpy as np

n =100 # X*Y is n*n matrix
t = 50 # Z is time
q = 0.1 # the probability for non-zero value

three_d = np.random.choice([0, 1], size=(n, n, t), p=[1-q, q]) # three_d.shape -> (10, 100, 50)

And now I'm wondering how to find sequences of one along the timeline and how to calculate the length of each one.

I began to consider various iterative solutions that fit the task with lists, but I couldn't come up with a vector solution.

I saw some question about template-matching in Numpy (like [this][1] and [this][2]). But I haven't seen anything dealing with the case of a repeating pattern (whose counting requires ignoring shorter sequences contained in the long sequence).


Just to be clear, I'll include an explicit calculation for the following case - what is the input and what is the desired output:

Assume we have a 3 * 3 * 3 matrix:

t=0:            t=1:          t=2:
_______________________________________
1 | 1 | 0      1 | 1 | 1      1 | 0 | 1
________________________________________
0 | 0 | 0      0 | 0 | 0      0 | 0 | 0
________________________________________
1 | 0 | 0      0 | 1 | 0      1 | 0 | 0

We can see that there are 6 one-sequence sequences along the timeline (T):

  • In the upper left box, there is a 3-length sequence.
  • Two sequences of 2 in the top-middle box and the top-right-hand box.
  • Three 1-length sequences in the bottom row: one in the middle square and two separated by 0 in the bottom-left square.

In this case, the desired result is:

res = [3, 2, 2, 1, 1, 1]


  [1]: https://stackoverflow.com/questions/36522220/searching-a-sequence-in-a-numpy-array
  [2]: https://stackoverflow.com/questions/42491979/finding-patterns-in-a-numpy-array
Yanirmr
  • 923
  • 8
  • 25
  • Please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). “Show me how to solve this coding problem” is not a Stack Overflow issue. We expect you to make an honest attempt, and *then* ask a *specific* question about your algorithm or technique. Stack Overflow is not intended to replace existing documentation and tutorials. You have the algorithm well in your mind; you should be able to provide a coding attempt. – Prune May 20 '21 at 21:54
  • 1
    Hi @Prune, I know how to answer this question for lists, but not for Numpy arrays. – Yanirmr May 20 '21 at 21:56
  • @Yanirmr is it guaranteed that every timeline has only one sequence of 1s? if not, how do you want the two separated by 0s sequence of 1s in the same timeline to be calculated? – Ehsan May 20 '21 at 22:25
  • I understand it such that multiple sequences of 1s are possible. If you used `np.argwhere` you'd get a array with the indeces of the 1s. That way you'd reduced the complexity of the problem. If the 1s are sparse, one could use non-vectorized loops to go through the much smaller dataset. – roadrunner66 May 20 '21 at 22:36

0 Answers0