I have a really long input stream, which I read line by line with the generator. Organizing data in batches greatly helps with the processing rate. The data reading loop approximately looks like this:
# Create tuple stream generator (an example, not the real code)
input_gen = ( (vals[0], vals[0]) for block in input_file for vals in block.split() )
while True:
batch = tuple(itertools.isslice(input_gen, 42)) # 42 is the batch size
if len(batch) == 0:
break
# process batch
The while
-if
construction looks cumbersome. Whether it is possible to organize the code with a simple for
loop?
For example:
for batch in <some_expression>:
# process batch