Split a string to even sized chunks

Question

How would I be able to take a string like 'aaaaaaaaaaaaaaaaaaaaaaa' and split it into 4 length tuples like (aaaa,aaaa,aaaa)

related: [What is the most “pythonic” way to iterate over a list in chunks?](http://stackoverflow.com/q/434287/4279) — jfs, Apr 04 '16 at 11:37
Does this answer your question? [Split string every nth character?](https://stackoverflow.com/questions/9475241/split-string-every-nth-character) — AMC, Feb 16 '20 at 00:18

score 45 · Accepted Answer · answered Jan 25 '14 at 13:40

45

Use textwrap.wrap:

>>> import textwrap
>>> s = 'aaaaaaaaaaaaaaaaaaaaaaa'
>>> textwrap.wrap(s, 4)
['aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaa']

answered Jan 25 '14 at 13:40

Ashwini Chaudhary

244,495
58
464
504

7

Won't this fail if the string contains spaces? – AMC Feb 16 '20 at 00:19
`textwrap` is very powerful and IMO, for a precise task like this, offers far too many options for things like replacing tabs with spaces, fixing sentence punctuation, etc. I would be more comfortable using something much simpler. – Jonathan Hartley Aug 31 '23 at 03:53

falsetru · Answer 2 · 2014-01-25T13:50:59.677

25

Using list comprehension, generator expression:

>>> s = 'aaaaaaaaaaaaaaaaaaaaaaa'
>>> [s[i:i+4] for i in range(0, len(s), 4)]
['aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaa']

>>> tuple(s[i:i+4] for i in range(0, len(s), 4))
('aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaa')

>>> s = 'a bcdefghi j'
>>> tuple(s[i:i+4] for i in range(0, len(s), 4))
('a bc', 'defg', 'hi j')

edited Jan 25 '14 at 13:50

answered Jan 25 '14 at 13:39

falsetru

357,413
63
732
636

score 5 · Answer 3 · answered Jan 25 '14 at 13:47

5

Another solution using regex:

>>> s = 'aaaaaaaaaaaaaaaaaaaaaaa'
>>> import re
>>> re.findall('[a-z]{4}', s)
['aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa']
>>>

answered Jan 25 '14 at 13:47

James Sapam

16,036
12
50
73

7

A regular expression is a bit overkill for this. – poke Jan 25 '14 at 13:49
The owner was asking for a solution, not asking for the most optimize one, so i just put as we can solve in that way too. never mind, i know its not the best of its kind. – James Sapam Jan 25 '14 at 14:08
Actually, that's a really nice solution (apart from regular expressions being slower when used in bulk) and easier to understand on first sight than the `zip()` solution. And it can easily be changed to work with arbitrary characters, including newlines: `re.findall('.{4}', s, re.DOTALL)` - Or even accept incomplete tails: `re.findall('.{1,4}', s, re.DOTALL)` – blubberdiblub Apr 05 '17 at 09:44

unutbu · Answer 4 · 2014-02-27T13:05:03.797

You could use the grouper recipe, zip(*[iter(s)]*4):

In [113]: s = 'aaaaaaaaaaaaaaaaaaaaaaa'

In [114]: [''.join(item) for item in zip(*[iter(s)]*4)]
Out[114]: ['aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa']

Note that textwrap.wrap may not split s into strings of length 4 if the string contains spaces:

In [43]: textwrap.wrap('I am a hat', 4)
Out[43]: ['I am', 'a', 'hat']

The grouper recipe is faster than using textwrap:

In [115]: import textwrap

In [116]: %timeit [''.join(item) for item in zip(*[iter(s)]*4)]
100000 loops, best of 3: 2.41 µs per loop

In [117]: %timeit textwrap.wrap(s, 4)
10000 loops, best of 3: 32.5 µs per loop

And the grouper recipe can work with any iterator, while textwrap only works with strings.

score 1 · Answer 5 · answered Jan 29 '19 at 09:42

1

s = 'abcdefghi'

k - no of parts of string

k = 3

parts - list to store parts of string

parts = [s[i:i+k] for i in range(0, len(s), k)]

parts --> ['abc', 'def', 'ghi']

answered Jan 29 '19 at 09:42

Himanshu

666
1
8
18

score 0 · Answer 6 · answered Sep 06 '16 at 08:18

0

s = 'abcdef'

We need to split in parts of 2

[s[pos:pos+2] for pos,i in enumerate(list(s)) if pos%2 == 0]

Answer:

['ab', 'cd', 'ef']

answered Sep 06 '16 at 08:18

Arindam Roychowdhury

5,927
5
55
63

score 0 · Answer 7 · answered Dec 01 '19 at 17:17

I think this method is simpler. But the message length must be split with split_size. Or letters must be added to the message. Example: message = "lorem ipsum_" then the added letter can be deleted.

message = "lorem ipsum"

array = []

temp = ""

split_size = 3

for i in range(1, len(message) + 1):
    temp += message[i - 1]

    if i % split_size == 0:
        array.append(temp)
        temp = ""

print(array)

Output: ['lor', 'em ', 'ips']

BPL · Answer 8 · 2020-03-24T17:18:12.820

Here's another possible solution to the given problem:

def split_by_length(text, width):
    width = max(1, width)
    chunk = ""
    for v in text:
        chunk += v
        if len(chunk) == width:
            yield chunk
            chunk = ""

    if chunk:
        yield chunk

if __name__ == '__main__':
    x = "123456789"
    for i in range(20):
        print(i, list(split_by_length(x, i)))

Output:

0 ['1', '2', '3', '4', '5', '6', '7', '8', '9']
1 ['1', '2', '3', '4', '5', '6', '7', '8', '9']
2 ['12', '34', '56', '78', '9']
3 ['123', '456', '789']
4 ['1234', '5678', '9']
5 ['12345', '6789']
6 ['123456', '789']
7 ['1234567', '89']
8 ['12345678', '9']
9 ['123456789']
10 ['123456789']
11 ['123456789']
12 ['123456789']
13 ['123456789']
14 ['123456789']
15 ['123456789']
16 ['123456789']
17 ['123456789']
18 ['123456789']
19 ['123456789']

score 0 · Answer 9 · answered Apr 19 '21 at 05:10

The kiddy way

def wrap(string, max_width):
    i=0
    strings = []
    s = ""
    for x in string:
        i+=1
        if i == max_width:
            s = s + x
            strings.append(s)
            s = ""
            i = 0
        else:
            s = s + x
    strings.append(s)
    return strings

wrap('ABCDEFGHIJKLIMNOQRSTUVWXYZ',4)
# output: ['ABCD', 'EFGH', 'IJKL', 'IMNO', 'QRST', 'UVWX', 'YZ']

Split a string to even sized chunks

9 Answers9

Linked

Related