3

I have a string like this:

string = 'aaabbbcccddd'

and next I want to have a list that contains ALL the pieces that are 3 indices long, so:

aaa, aab, abb, bbb, bbc, bcc, ccc, ccd, cdd, ddd

How do I get there? Because re.finditer & re.findall won't take overlapping matches, which I do need.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
RonaldN
  • 129
  • 8

3 Answers3

5

Well, there's a simple way to do this:

>>> for a, b, c in zip(string[:], string[1:], string[2:]):
...     print(a, b, c)
...      
a a a
a a b
a b b
b b b
b b c
b c c
c c c
c c d
c d d
d d d

This using a list comprehension:

>>> ["".join(var) for var in zip(string, string[1:], string[2:])]
['aaa', 'aab', 'abb', 'bbb', 'bbc', 'bcc', 'ccc', 'ccd', 'cdd', 'ddd']
Games Brainiac
  • 80,178
  • 33
  • 141
  • 199
4

You want to create a sliding window over the string:

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

print [''.join(slice) for slice in window(string, 3)]

This produces:

>>> string = 'aaabbbcccddd'
>>> [''.join(slice) for slice in window(string, 3)]
['aaa', 'aab', 'abb', 'bbb', 'bbc', 'bcc', 'ccc', 'ccd', 'cdd', 'ddd']
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
3

An alternative that surely may be improved:

>>> s = 'aaabbbcccddd'
>>> [s[i:i+3] for i in range(len(s)-2)]
['aaa', 'aab', 'abb', 'bbb', 'bbc', 'bcc', 'ccc', 'ccd', 'cdd', 'ddd']
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Robert
  • 33,429
  • 8
  • 90
  • 94