What's the best way to split a string into fixed length chunks and work with them in Python?

Question

I am reading in a line from a text file using:

   file = urllib2.urlopen("http://192.168.100.17/test.txt").read().splitlines()

and outputting it to an LCD display, which is 16 characters wide, in a telnetlib.write command. In the event that the line read is longer than 16 characters I want to break it down into sections of 16 character long strings and push each section out after a certain delay (e.g. 10 seconds), once complete the code should move onto the next line of the input file and continue.

I've tried searching various solutions and reading up on itertools etc. but my understanding of Python just isn't sufficient to get anything to work without doing it in a very long winded way using a tangled mess of if then else statements that's probably going to tie me in knots!

What's the best way for me to do what I want?

To split into chunks, the `chunks` function [here](http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python) should work. — mgilson, Sep 17 '13 at 16:09
@mgilson I vote to close as a duplicate, because the answer is the same — Marcin, Sep 17 '13 at 16:39
@mgilson @Marcin - this question is slightly different if you consider that when the input is a string, you can use the `re` module to chunk it with `re.findall('.{%d}' % length, string)`. — carl.anderson, Aug 21 '14 at 16:02
@carl.anderson -- sure you _could_. I'm not convinced that it would be faster (although maybe ...) and it's definitely not easier to read to my un-re-trained eye. — mgilson, Aug 21 '14 at 16:35

rlms · Accepted Answer · 2013-09-18T14:31:41.407

79

One solution would be to use this function:

def chunkstring(string, length):
    return (string[0+i:length+i] for i in range(0, len(string), length))

This function returns a generator, using a generator comprehension. The generator returns the string sliced, from 0 + a multiple of the length of the chunks, to the length of the chunks + a multiple of the length of the chunks.

You can iterate over the generator like a list, tuple or string - for i in chunkstring(s,n): , or convert it into a list (for instance) with list(generator). Generators are more memory efficient than lists because they generator their elements as they are needed, not all at once, however they lack certain features like indexing.

This generator also contains any smaller chunk at the end:

>>> list(chunkstring("abcdefghijklmnopqrstuvwxyz", 5))
['abcde', 'fghij', 'klmno', 'pqrst', 'uvwxy', 'z']

Example usage:

text = """This is the first line.
           This is the second line.
           The line below is true.
           The line above is false.
           A short line.
           A very very very very very very very very very long line.
           A self-referential line.
           The last line.
        """

lines = (i.strip() for i in text.splitlines())

for line in lines:
    for chunk in chunkstring(line, 16):
        print(chunk)

edited Sep 18 '13 at 14:31

answered Sep 17 '13 at 16:14

rlms

10,650
8
44
61

I just about understand what the function is doing but I'm still missing some bits, such as how best to use the generated chunks. For example I have "for line in file" followed by the code to update the display followed by a wait but how should I step through each chunk before moving onto the next line (i.e. how do I know how many chunks I have and refer to them e.g. if I use "for i in chunkstring(s,n):" how do I "print" chunk 1 or chunk 3?) – LostRob Sep 18 '13 at 12:58
Never mind, I didn't understand your answer very well. I read [this](http://stackoverflow.com/questions/231767/the-python-yield-keyword-explained) explanation of iterables and generators which helped me realise my mistake. – LostRob Sep 18 '13 at 14:46
I hadn't seen your edit. Thank you, that also clarifies it for me! – LostRob Sep 18 '13 at 14:49
@LostRob That's unsuprising, I edited it in response to your comment! – rlms Sep 18 '13 at 14:52
3

Great snippet. And despite the name, there's nothing limiting it to strings. – Cerin Jul 09 '14 at 02:45

score 10 · Answer 2 · answered Aug 21 '14 at 15:54

My favorite way to solve this problem is with the re module.

import re

def chunkstring(string, length):
  return re.findall('.{%d}' % length, string)

One caveat here is that re.findall will not return a chunk that is less than the length value, so any remainder is skipped.

However, if you're parsing fixed-width data, this is a great way to do it.

For example, if I want to parse a block of text that I know is made up of 32 byte characters (like a header section) I find this very readable and see no need to generalize it into a separate function (as in chunkstring):

for header in re.findall('.{32}', header_data):
  ProcessHeader(header)

Nice. An incomplete final chunk can be included using `re.findall('.{1,%d}' % length, string)`. — tom, Nov 28 '18 at 02:40
More readable but 50% slower than regular slicing, from a few `timeit` runs I did. — Giacomo Lacava, Aug 03 '19 at 11:54

FObersteiner · Answer 3 · 2022-09-26T10:49:59.730

8

the standard library offers textwrap.wrap:

from textwrap import wrap

s = "some random text that should be splitted into chunks"

print(wrap(s, width=3))

['som', 'e r', 'and', 'om ', 'tex', 't t', 'hat', 'sho', 'uld', 'be ', 'spl', 
 'itt', 'ed ', 'int', 'o c', 'hun', 'ks']

edited Sep 26 '22 at 10:49

answered Sep 26 '22 at 10:41

FObersteiner

22,500
8
42
72

2

This should be the accepted solution as it uses the standard library! – fuomag9 Jan 10 '23 at 15:03
1

technicall, the other solutions also use only the standard lib. but I found this to be a very convenient wrapper (no pun intended ;-)) – FObersteiner Jan 10 '23 at 15:38

score 3 · Answer 4 · answered Jan 09 '19 at 20:44

I know it's an oldie, but like to add how to chop up a string with variable length columns:

def chunkstring(string, lengths):
    return (string[pos:pos+length].strip()
            for idx,length in enumerate(lengths)
            for pos in [sum(map(int, lengths[:idx]))])

column_lengths = [10,19,13,11,7,7,15]
fields = list(chunkstring(line, column_lengths))

score 2 · Answer 5 · answered Oct 04 '20 at 18:26

I think this way is easier to read:

string = "when an unknown printer took a galley of type and scrambled it to make a type specimen book."
length = 20
list_of_strings = []
for i in range(0, len(string), length):
    list_of_strings.append(string[i:length+i])
print(list_of_strings)

score 1 · Answer 6 · answered Oct 10 '22 at 14:30

1

Doing it with list-comprehension:

n = "aaabbbcccddd"
k = 3
[n[i:i+k] for i in range(0,len(n),k)]
=> ['aaa', 'bbb', 'ccc', 'ddd']

answered Oct 10 '22 at 14:30

Shuizid

11
1

score 0 · Answer 7 · answered Mar 18 '23 at 07:11

0

Doing it with ever more simplicity:

str_to_split="KIMJEONG" # Your string to split here
n=4 # Your chunk length here
buf=""
ourchunks=[]
x=0

for i in str_to_split:
   x += 1
   buf += i
   if (x % 4) == 0:
     ourchunks.append(buf)
     buf=""

answered Mar 18 '23 at 07:11

SUSMAN SUSMAN

1

What's the best way to split a string into fixed length chunks and work with them in Python?

7 Answers7

Linked

Related