Looking for alternative, more pythonic, way to iterate over multiple characters on a string

Question

The simple and reliable way to "select" units, doubles, trinities etc.. of a string as you already know is this:

somestr = 'ABCDABCDABCDABCDABCDABCD'  
a = 0  
z = 3  
for i in somestr:  
    i = somestr[a:z]  
    # finally here i can work with with these 3 first characters of the somestr
    a += 1  # or 3 for non-overlapping
    z += 1

So my question is how could someone simplify this block of code according to the rules of python.
I'm interested in both overlapping and non-overlapping cases.

See [How do you split a list into evenly sized chunks in Python?](http://stackoverflow.com/q/312443) for the non-overlapping case. Simply use `xrange(0, len(l) - n)` for the overlapping version. — Martijn Pieters, Nov 22 '13 at 17:35
You don't really need both `a` and `z` here since `z = a + constant`. — mgilson, Nov 22 '13 at 17:38
@fractal_7 - Here is a reference: http://stackoverflow.com/questions/135041/should-you-always-favor-xrange-over-range — , Nov 22 '13 at 18:08

jonrsharpe · Answer 1 · 2013-11-22T21:42:11.540

2

Python's range function includes a step argument, so for the simplest case you can do:

for i in range(0, len(somestring) - 3, 3):
    somestring[i:i+3]

You could create a generator function as follows:

def substring_generator(string, length, overlap=True):
    for i in range(0, len(string) - length + 1, 1 if overlap else length):
        yield string[i:i+length]

And use this for both cases:

>>> print([x for x in substring_generator("ABCDEFG", 3, True)])
['ABC', 'BCD', 'CDE', 'DEF', 'EFG']
>>> print([x for x in substring_generator("ABCDEFG", 3, False)])
['ABC', 'DEF']

edited Nov 22 '13 at 21:42

answered Nov 22 '13 at 17:32

jonrsharpe

115,751
26
228
437

that substring generator seems to be exactly what i need. but i think it should be range(0, len(string) - lenght-1 not lenght+1. can you confirm it? i just checked – Christos Karapapas Nov 22 '13 at 17:55
It's definitely +1, otherwise you don't get to the end of the string – jonrsharpe Nov 22 '13 at 21:44

score 2 · Accepted Answer · 2013-11-22T18:12:15.660

2

Regex can do this job nice and easily:

>>> from re import findall
>>> somestr = 'ABCDABCDABCDABCDABCDABCD'
>>> # no overlapping
>>> for i in findall(".{3}", somestr):
...     print(i)
...
ABC
DAB
CDA
BCD
ABC
DAB
CDA
BCD
>>> # overlapping
>>> for i in findall("(?=(.{3}))", somestr):
...     print(i)
...
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
>>>

Note that I had it set to work with groups of 3. You can pick any number though.

edited Nov 22 '13 at 18:12

answered Nov 22 '13 at 17:59

ok i think this is by far the simplest way, i was about to choose the solution of jonrsharpe but this one with findall is even more simple since you don't have to mind about the limits. – Christos Karapapas Nov 22 '13 at 18:23

score 1 · Answer 3 · answered Nov 22 '13 at 17:36

You can use itertools:

import itertools

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

This allows you to:

>>> [''.join(item) for item in grouper(somestr, 3)]
['ABC', 'DAB', 'CDA', 'BCD', 'ABC', 'DAB', 'CDA', 'BCD']
>>> [''.join(item) for item in grouper(somestr, 4)]
['ABCD', 'ABCD', 'ABCD', 'ABCD', 'ABCD', 'ABCD']

Note that you need a fillvalue when the last string wouldn't have enough characters:

>>> [''.join(item) for item in grouper(somestr, 5, fillvalue='')]
['ABCDA', 'BCDAB', 'CDABC', 'DABCD', 'ABCD']

score 1 · Answer 4 · answered Nov 22 '13 at 17:36

1

The easiest way to deal with chunks of data is itertools.izip:

from itertools import izip

def chunks(iterable, size=2):
  it = iter(iterable)
  return izip(*[it]*size)

answered Nov 22 '13 at 17:36

g.d.d.c

46,865
9
101
111

This returns tuples of individual characters, not substrings. – Martijn Pieters Nov 22 '13 at 17:38
Agreed. It was a demonstration of approach, more than fully implemented code. – g.d.d.c Nov 22 '13 at 17:42

TehTris · Answer 5 · 2013-11-22T17:44:31.890

how_many = 3
every_other = 3
three_at_a_time_skipping_three = [somestr[somestr.index(x):somestr.index(x)+how_many] for x in somestr[::every_other]]
#this pretty much means from where you started till 'how many' you want each time, while incrementing starting point by 'every_other'
print(three_at_a_time_skipping_three)
['ABC', 'DAB', 'CDA', 'BCD', 'ABC', 'DAB', 'CDA', 'BCD']

how_many = 4
four_at_a_time_skipping_three = [somestr[somestr.index(x):somestr.index(x)+how_many] for x in somestr[::every_other]]
print(four_at_a_time_skipping_three)
['ABCD', 'DABC', 'CDAB', 'BCDA', 'ABCD', 'DABC', 'CDAB', 'BCDA']

adjusting how_many and every_other will give you various results.

This is super ugly and super unreadable, but the general gist of it is, it uses slicing in somestr using the location of the item its iterating over. the [::every_other] is telling it to skip that many in the somestr.

Looking for alternative, more pythonic, way to iterate over multiple characters on a string

5 Answers5