0

The simple and reliable way to "select" units, doubles, trinities etc.. of a string as you already know is this:

somestr = 'ABCDABCDABCDABCDABCDABCD'  
a = 0  
z = 3  
for i in somestr:  
    i = somestr[a:z]  
    # finally here i can work with with these 3 first characters of the somestr
    a += 1  # or 3 for non-overlapping
    z += 1  

So my question is how could someone simplify this block of code according to the rules of python.
I'm interested in both overlapping and non-overlapping cases.

Christos Karapapas
  • 1,018
  • 3
  • 19
  • 40
  • 2
    See [How do you split a list into evenly sized chunks in Python?](http://stackoverflow.com/q/312443) for the non-overlapping case. Simply use `xrange(0, len(l) - n)` for the overlapping version. – Martijn Pieters Nov 22 '13 at 17:35
  • 1
    You don't really need both `a` and `z` here since `z = a + constant`. – mgilson Nov 22 '13 at 17:38
  • what is the difference between range and xrange ? – Christos Karapapas Nov 22 '13 at 17:47
  • @fractal_7 - Here is a reference: http://stackoverflow.com/questions/135041/should-you-always-favor-xrange-over-range –  Nov 22 '13 at 18:08

5 Answers5

2

Python's range function includes a step argument, so for the simplest case you can do:

for i in range(0, len(somestring) - 3, 3):
    somestring[i:i+3]

You could create a generator function as follows:

def substring_generator(string, length, overlap=True):
    for i in range(0, len(string) - length + 1, 1 if overlap else length):
        yield string[i:i+length]

And use this for both cases:

>>> print([x for x in substring_generator("ABCDEFG", 3, True)])
['ABC', 'BCD', 'CDE', 'DEF', 'EFG']
>>> print([x for x in substring_generator("ABCDEFG", 3, False)])
['ABC', 'DEF']
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
2

Regex can do this job nice and easily:

>>> from re import findall
>>> somestr = 'ABCDABCDABCDABCDABCDABCD'
>>> # no overlapping
>>> for i in findall(".{3}", somestr):
...     print(i)
...
ABC
DAB
CDA
BCD
ABC
DAB
CDA
BCD
>>> # overlapping
>>> for i in findall("(?=(.{3}))", somestr):
...     print(i)
...
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
CDA
DAB
ABC
BCD
>>>

Note that I had it set to work with groups of 3. You can pick any number though.

  • ok i think this is by far the simplest way, i was about to choose the solution of jonrsharpe but this one with findall is even more simple since you don't have to mind about the limits. – Christos Karapapas Nov 22 '13 at 18:23
1

You can use itertools:

import itertools

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

This allows you to:

>>> [''.join(item) for item in grouper(somestr, 3)]
['ABC', 'DAB', 'CDA', 'BCD', 'ABC', 'DAB', 'CDA', 'BCD']
>>> [''.join(item) for item in grouper(somestr, 4)]
['ABCD', 'ABCD', 'ABCD', 'ABCD', 'ABCD', 'ABCD']

Note that you need a fillvalue when the last string wouldn't have enough characters:

>>> [''.join(item) for item in grouper(somestr, 5, fillvalue='')]
['ABCDA', 'BCDAB', 'CDABC', 'DABCD', 'ABCD']
Simeon Visser
  • 118,920
  • 18
  • 185
  • 180
1

The easiest way to deal with chunks of data is itertools.izip:

from itertools import izip

def chunks(iterable, size=2):
  it = iter(iterable)
  return izip(*[it]*size)
g.d.d.c
  • 46,865
  • 9
  • 101
  • 111
1
how_many = 3
every_other = 3
three_at_a_time_skipping_three = [somestr[somestr.index(x):somestr.index(x)+how_many] for x in somestr[::every_other]]
#this pretty much means from where you started till 'how many' you want each time, while incrementing starting point by 'every_other'
print(three_at_a_time_skipping_three)
['ABC', 'DAB', 'CDA', 'BCD', 'ABC', 'DAB', 'CDA', 'BCD']

how_many = 4
four_at_a_time_skipping_three = [somestr[somestr.index(x):somestr.index(x)+how_many] for x in somestr[::every_other]]
print(four_at_a_time_skipping_three)
['ABCD', 'DABC', 'CDAB', 'BCDA', 'ABCD', 'DABC', 'CDAB', 'BCDA']

adjusting how_many and every_other will give you various results.

This is super ugly and super unreadable, but the general gist of it is, it uses slicing in somestr using the location of the item its iterating over. the [::every_other] is telling it to skip that many in the somestr.

TehTris
  • 3,139
  • 1
  • 21
  • 33