Divide string into chunks in python

Question

I wrote the code with extracts words from corpus, then tokenizes them and compares to sentences. The output is Bag of Words (if word is in the sentence 1, if not 0).

import nltk
import numpy as np
from nltk import FreqDist
from nltk.corpus import brown


news = brown.words(categories='news') 
news_sents = brown.sents(categories='news') 

fdist = FreqDist(w.lower() for w in news) 
vocabulary = [word for word, _ in fdist.most_common(100)] 
num_sents = len(news_sents) 

for i in range(num_sents):
    features = {}
    for word in vocabulary: 
        features[word] = int(word in news_sents[i]) 

    bow = "".join(str(n) for n in list(features.values()))
    f = open("D:\\test\\Vector.txt", "a") 
    print(bow, file=f) 
    f.close()

In this case output string is 100 characters long. I want to split it into chunks of some arbitrary length and assign chunk number to it. For example:

print(i+1, chunk_id, bow, sep="\t", end="\n", file=f)

Where i+1 will be sentence id. To visualize what I mean, lets take strings of length 12 >> "110010101111" and "011011000011". It should look like:

The dupe is talking about lists but the solutions will work for strings, too. — timgeb, Feb 29 '16 at 11:21

score 0 · Answer 1 · answered Feb 29 '16 at 11:01

The grouper function from the itertools documentation seems to be what you're looking for:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Divide string into chunks in python

1 Answers1