0

In a numerical sequence (e.g. one-dimensional array) I want to find different patterns of numbers and count each finding separately. However, the numbers can occur repeatedly but only the basic pattern is important.

# Example signal (1d array)
a = np.array([1,1,2,2,2,2,1,1,1,2,1,1,2,3,3,3,3,3,2,2,1,1,1])

# Search for these exact following "patterns": [1,2,1], [1,2,3], [3,2,1]

# Count the number of pattern occurrences
# [1,2,1] = 2 (occurs 2 times)
# [1,2,3] = 1 
# [3,2,1] = 1

I have come up with the Knuth-Morris-Pratt string matching (http://code.activestate.com/recipes/117214/), which gives me the index of the searched pattern.

for s in KnuthMorrisPratt(list(a), [1,2,1]):
    print('s')

The problem is, I don't know how to find the case, where the pattern [1,2,1] "hides" in the sequence [1,2,2,2,1]. I need to find a way to reduce this sequence of repeated numbers in order to get to [1,2,1]. Any ideas?

NumbThumb
  • 147
  • 4
  • 11

3 Answers3

2

I don't use NumPy and I am quite new to Python, so there might be a better and more efficient solution.

I would write a function like this:

def dac(data, pattern):
    count = 0
    for i in range(len(data)-len(pattern)+1):
        tmp = data[i:(i+len(pattern))]

        if tmp == pattern:
            count +=1

    return count

If you want to ignore repeated numbers in the middle of your pattern:

def dac(data, pattern):
    count = 0
    for i in range(len(data)-len(pattern)+1):
        tmp = [data[i], data [i+1]]

        try:
            for j in range(len(data)-i):
                print(i, i+j)
                if tmp[-1] != data[i+j+1]:
                    tmp.append(data[i+j+1])

                if len(tmp) == len(pattern):
                    print(tmp)
                    break
        except:
            pass

        if tmp == pattern:
            count +=1
    return count

Hope that might help.

JBecker
  • 124
  • 7
  • Thanks for your contribution, but the code seems to find ONLY the exact pattern in the signal rathern than any version where certain numbers are repeated for several times. Example: The pattern [1,2,1] should be also found in the signal [1,1,2,2,2,1,1], only the transition from "1 to 2 to 1" is important and needs to be found. This also counts as a detected event. Any suggestions how to make additions to your code so that these cases are also found? – NumbThumb Sep 29 '16 at 13:06
  • I added a version to ignore double numbers in the middle. – JBecker Sep 29 '16 at 13:11
  • Nice, this does the job! Much appreciated! :) – NumbThumb Sep 29 '16 at 13:44
1

Here's a one-liner that will do it

import numpy as np

a = np.array([1,1,2,2,2,2,1,1,1,2,1,1,2,3,3,3,3,3,2,2,1,1,1])
p = np.array([1,2,1])

num = sum(1 for k in 
          [a[j:j+len(p)] for j in range(len(a) - len(p) + 1)]
          if np.array_equal(k, p))

The innermost part is a list comprehension that generates all pieces of the array that are the same length as the pattern. The outer part sums 1 for every element of this list which matches the pattern.

Chris Mueller
  • 6,490
  • 5
  • 29
  • 35
  • The pattern p should be found 2 times in the array a. The sequence [1,2,1] also "hides" in [1,2,2,2,2,1] of the signal. Your suggestion just finds the sequence once if I'm right? – NumbThumb Sep 29 '16 at 12:25
  • Yes, this just matches the exact pattern that you input, not an arbitrary number of repetitions of one part of the pattern. – Chris Mueller Sep 29 '16 at 12:27
  • Any suggestions how it would be possible to reduce the signal so that no repetitions occur and the "hidden" patterns can also be found? This would solve the problem. ;) – NumbThumb Sep 29 '16 at 12:27
  • @Chris Mueller, I tried your solution and seems to also find the sub-patterns, - which means it worked great - tried for [2,2] and got 4 back as a result! – coder Sep 29 '16 at 12:29
  • There's no need to create a list just to count its elements, see http://stackoverflow.com/q/390852/12892 – Cristian Ciupitu Sep 29 '16 at 12:29
  • @coder Only the "inequal" pattern (e.g. [1,2,1] or [2,3,2]) are not found because of the condition of the original signal where the pattern [2,3,2] "hides" in [2,3,3,3,3,3,2] for example. – NumbThumb Sep 29 '16 at 12:32
  • @CristianCiupitu Thanks for the tip! Edited. – Chris Mueller Sep 29 '16 at 12:34
  • You can also convert the inner list comprehension to a generator comprehension. – Cristian Ciupitu Sep 29 '16 at 13:10
1

The only way I could think of solving your problem with the subpatterns matching was to use regex.

The following is a demonstration for findind for example the sequence [1,2,1] in list1:

import re

list1 = [1,1,2,2,2,2,1,1,1,2,1,1,2,3,3,3,3,3,2,2,1,1,1]
str_list = ''.join(str(i) for i in list1)
print re.findall(r'1+2+1', str_list)

This will give you as a result:

>>> print re.findall(r'1+2+1', str_list)
['1122221', '1121']
coder
  • 12,832
  • 5
  • 39
  • 53