0

I have a list of sequences of numbers in which I am trying to find the recurring patterns. Here is what I did to find the recurring pattern in one sequence of numbers:

import collections
import more_itertools

sequence = [
    1, 2, 4, 1, 4, 1, 2, 4, 6, 7
]

size_2 = 2
size_3 = 3
size_4 = 4

window_2 = [
    tuple(window)
    for window in more_itertools.windowed(sequence, size_2)
]
window_3 = [
    tuple(window)
    for window in more_itertools.windowed(sequence, size_3)
]
window_4 = [
    tuple(window)
    for window in more_itertools.windowed(sequence, size_4)
]

counter_2 = collections.Counter(window_2)
counter_3 = collections.Counter(window_3)
counter_4 = collections.Counter(window_4)

for window, count in counter_2.items():
    if count > 1:
        print(window, count)
for window, count in counter_3.items():
    if count > 1:
        print(window, count)
for window, count in counter_4.items():
    if count > 1:
        print(window, count)

I'm sure there is a much better way to write this code, but since I'm new to Python, this is what I could come up with. The output of the above code is:

(1, 2) 2
(2, 4) 2
(4, 1) 2
(1, 2, 4) 2

Now I need to do this for a series of sequences, for example:

sequence = [
    [1, 2, 4, 1, 4, 1, 2, 4, 6, 7],
    [3, 1, 3, 4, 3, 4, 2, 4, 7, 4, 7, 4, 6, 7, 6, 7],
    [2, 4, 1, 2, 3, 2, 3, 4, 1, 3, 4, 2, 4, 6, 4, 1, 4, 6],
]

I need to find the recurring patterns in each sequence separately. An example output would be:

[1, 2, 4, 1, 4, 1, 2, 4, 6, 7]
(1, 2) 2
(2, 4) 2
(4, 1) 2
(1, 2, 4) 2

[3, 1, 3, 4, 3, 4, 2, 4, 7, 4, 7, 4, 6, 7, 6, 7]
(3, 4) 2
(4, 7) 2
(7, 4) 2
(6, 7) 2
(4, 7, 4) 2
etc.

I've tried different ways with no luck. Any help would be appreciated.

Leila
  • 182
  • 1
  • 1
  • 8
  • 1
    It is not necessary to use `tuple(window)`. Each iteration of `windowed` returns tuple. – Mechanic Pig Aug 20 '22 at 11:35
  • I understand it depends on the actual use case, but I'm not sure your way of counting repeating patterns is correct. In a sequence like `[1,2,3,4,1,2,3,4]` I would not say that `[1,2]`, `[2,3]`, `[3,4]`, `[1,2,3]`, `[2,3,4]` are repeated. If for instance you are using this to measure the diversity of your sequence you would count a lot of repetitions that are really not there. – gimix Aug 20 '22 at 12:11
  • @gimix That's right. Even though this is what is required at this stage, I do need to change it for another analysis. Do you know how I could fix it? – Leila Aug 21 '22 at 15:45

1 Answers1

2

You can use multiple for-loop base sequence and size.

from collections import Counter
import more_itertools

sequence = [
    [1, 2, 4, 1, 4, 1, 2, 4, 6, 7],
    [3, 1, 3, 4, 3, 4, 2, 4, 7, 4, 7, 4, 6, 7, 6, 7],
    [2, 4, 1, 2, 3, 2, 3, 4, 1, 3, 4, 2, 4, 6, 4, 1, 4, 6],
]

for seq in sequence:
    print(seq)
    for size in [2, 3, 4]:
        for win, cnt  in Counter(
            window
            for window in more_itertools.windowed(seq, size)
        ).items():
            if cnt > 1:
                print(win, cnt)
    print()

[1, 2, 4, 1, 4, 1, 2, 4, 6, 7]
(1, 2) 2
(2, 4) 2
(4, 1) 2
(1, 2, 4) 2

[3, 1, 3, 4, 3, 4, 2, 4, 7, 4, 7, 4, 6, 7, 6, 7]
(3, 4) 2
(4, 7) 2
(7, 4) 2
(6, 7) 2
(4, 7, 4) 2

[2, 4, 1, 2, 3, 2, 3, 4, 1, 3, 4, 2, 4, 6, 4, 1, 4, 6]
(2, 4) 2
(4, 1) 3
(2, 3) 2
(3, 4) 2
(4, 6) 2

You can save result in dict.

dct = {str(seq) : [
            (win, cnt) 
            for size in [2, 3, 4] 
            for win, cnt in Counter(window for window in more_itertools.windowed(seq, size)).items() 
            if cnt > 1]
       for seq in sequence}

print(dct)

{'[1, 2, 4, 1, 4, 1, 2, 4, 6, 7]': 
 [((1, 2), 2),
  ((2, 4), 2),
  ((4, 1), 2),
  ((1, 2, 4), 2)],
 '[3, 1, 3, 4, 3, 4, 2, 4, 7, 4, 7, 4, 6, 7, 6, 7]': 
 [((3, 4), 2),
  ((4, 7), 2),
  ((7, 4), 2),
  ((6, 7), 2),
  ((4, 7, 4), 2)],
 '[2, 4, 1, 2, 3, 2, 3, 4, 1, 3, 4, 2, 4, 6, 4, 1, 4, 6]': 
 [((2, 4), 2),
  ((4, 1), 3),
  ((2, 3), 2),
  ((3, 4), 2),
  ((4, 6), 2)]}
I'mahdi
  • 23,382
  • 5
  • 22
  • 30