Detect and count numerical sequence in Python array

Question

In a numerical sequence (e.g. one-dimensional array) I want to find different patterns of numbers and count each finding separately. However, the numbers can occur repeatedly but only the basic pattern is important.

# Example signal (1d array)
a = np.array([1,1,2,2,2,2,1,1,1,2,1,1,2,3,3,3,3,3,2,2,1,1,1])

# Search for these exact following "patterns": [1,2,1], [1,2,3], [3,2,1]

# Count the number of pattern occurrences
# [1,2,1] = 2 (occurs 2 times)
# [1,2,3] = 1 
# [3,2,1] = 1

I have come up with the Knuth-Morris-Pratt string matching (http://code.activestate.com/recipes/117214/), which gives me the index of the searched pattern.

for s in KnuthMorrisPratt(list(a), [1,2,1]):
    print('s')

The problem is, I don't know how to find the case, where the pattern [1,2,1] "hides" in the sequence [1,2,2,2,1]. I need to find a way to reduce this sequence of repeated numbers in order to get to [1,2,1]. Any ideas?

What constitutes a "pattern"? Your problem sounds very broad at best. — Andras Deak -- Слава Україні, Sep 29 '16 at 11:53
are patterns you are finding is predefined?? or you want to find all possible pattern for a triplet?? — armak, Sep 29 '16 at 11:54
@AndrasDeak A pattern in this case is just a sequence of numers as shown in the code comment. At ScottHunter: Tried the KnuthMorrisPratt algorithm for matching. At armak: Yes, the patterns are predefined and I just want to find these (in the example) 3 patterns. — NumbThumb, Sep 29 '16 at 12:17

JBecker · Accepted Answer · 2016-09-29T13:26:57.957

2

I don't use NumPy and I am quite new to Python, so there might be a better and more efficient solution.

I would write a function like this:

def dac(data, pattern):
    count = 0
    for i in range(len(data)-len(pattern)+1):
        tmp = data[i:(i+len(pattern))]

        if tmp == pattern:
            count +=1

    return count

If you want to ignore repeated numbers in the middle of your pattern:

def dac(data, pattern):
    count = 0
    for i in range(len(data)-len(pattern)+1):
        tmp = [data[i], data [i+1]]

        try:
            for j in range(len(data)-i):
                print(i, i+j)
                if tmp[-1] != data[i+j+1]:
                    tmp.append(data[i+j+1])

                if len(tmp) == len(pattern):
                    print(tmp)
                    break
        except:
            pass

        if tmp == pattern:
            count +=1
    return count

Hope that might help.

edited Sep 29 '16 at 13:26

answered Sep 29 '16 at 12:21

JBecker

124
7

Thanks for your contribution, but the code seems to find ONLY the exact pattern in the signal rathern than any version where certain numbers are repeated for several times. Example: The pattern [1,2,1] should be also found in the signal [1,1,2,2,2,1,1], only the transition from "1 to 2 to 1" is important and needs to be found. This also counts as a detected event. Any suggestions how to make additions to your code so that these cases are also found? – NumbThumb Sep 29 '16 at 13:06
I added a version to ignore double numbers in the middle. – JBecker Sep 29 '16 at 13:11
Nice, this does the job! Much appreciated! :) – NumbThumb Sep 29 '16 at 13:44

Chris Mueller · Answer 2 · 2016-09-29T12:34:16.697

1

Here's a one-liner that will do it

import numpy as np

a = np.array([1,1,2,2,2,2,1,1,1,2,1,1,2,3,3,3,3,3,2,2,1,1,1])
p = np.array([1,2,1])

num = sum(1 for k in 
          [a[j:j+len(p)] for j in range(len(a) - len(p) + 1)]
          if np.array_equal(k, p))

The innermost part is a list comprehension that generates all pieces of the array that are the same length as the pattern. The outer part sums 1 for every element of this list which matches the pattern.

edited Sep 29 '16 at 12:34

answered Sep 29 '16 at 12:22

Chris Mueller

6,490
5
29
35

The pattern p should be found 2 times in the array a. The sequence [1,2,1] also "hides" in [1,2,2,2,2,1] of the signal. Your suggestion just finds the sequence once if I'm right? – NumbThumb Sep 29 '16 at 12:25
Yes, this just matches the exact pattern that you input, not an arbitrary number of repetitions of one part of the pattern. – Chris Mueller Sep 29 '16 at 12:27
Any suggestions how it would be possible to reduce the signal so that no repetitions occur and the "hidden" patterns can also be found? This would solve the problem. ;) – NumbThumb Sep 29 '16 at 12:27
@Chris Mueller, I tried your solution and seems to also find the sub-patterns, - which means it worked great - tried for [2,2] and got 4 back as a result! – coder Sep 29 '16 at 12:29
There's no need to create a list just to count its elements, see http://stackoverflow.com/q/390852/12892 – Cristian Ciupitu Sep 29 '16 at 12:29
@coder Only the "inequal" pattern (e.g. [1,2,1] or [2,3,2]) are not found because of the condition of the original signal where the pattern [2,3,2] "hides" in [2,3,3,3,3,3,2] for example. – NumbThumb Sep 29 '16 at 12:32
@CristianCiupitu Thanks for the tip! Edited. – Chris Mueller Sep 29 '16 at 12:34
You can also convert the inner list comprehension to a generator comprehension. – Cristian Ciupitu Sep 29 '16 at 13:10

score 1 · Answer 3 · answered Sep 29 '16 at 13:43

The only way I could think of solving your problem with the subpatterns matching was to use regex.

The following is a demonstration for findind for example the sequence [1,2,1] in list1:

import re

list1 = [1,1,2,2,2,2,1,1,1,2,1,1,2,3,3,3,3,3,2,2,1,1,1]
str_list = ''.join(str(i) for i in list1)
print re.findall(r'1+2+1', str_list)

This will give you as a result:

>>> print re.findall(r'1+2+1', str_list)
['1122221', '1121']

Detect and count numerical sequence in Python array

3 Answers3