0

Hoping that someone can help me understand the following. Writing a small program to read a csv file in K line chunks. I've seen the other stack questions about this an that's not what I'm asking here. I'm trying to understand why one program terminates and the other never does.

This code never terminates:

from __future__ import print_function
from itertools import islice
import time
import csv
def gen_csv1(input_file, chunk_size=50):
    try:
        with open(input_file) as in_file:
            csv_reader = csv.reader(in_file)
            while True:
                yield islice(csv_reader, chunk_size)
    except StopIteration:
        pass

gen1 = gen_csv1('./test100.csv')

for chunk in gen1:
    print(list(chunk))
    time.sleep(1)

While this works fine. With the only difference being the islice outside the yield from the generator.

def gen_csv(input_file):
    try: 
        with open(input_file) as in_file:
            csv_reader = csv.reader(in_file)
            while True:
                yield next(csv_reader)
    except StopIteration:
        pass


gen = gen_csv('./test100.csv')
for chunk in gen:
    rows = islice(gen, 50)
    print(list(rows))
    time.sleep(1)

I'm stumped. Any guidance is hugely appreciated. This is more out of curiosity than for work reasons.

mistertee
  • 655
  • 1
  • 5
  • 14
  • With the help of https://stackoverflow.com/questions/1915170/split-a-generator-iterable-every-n-items-in-python-splitevery I found a working solution but would still like to understand what's going on here. – mistertee Jul 08 '17 at 13:58

1 Answers1

2

Per the docs,

[islice] works like a slice() on a list but returns an iterator.

When you slice an empty list, an empty list is returned:

In [118]: [][:3]
Out[118]: []

Similarly, when you islice an empty iterator, an empty iterator is returned. In contrast, calling next on an empty iterator raises StopIteration:

In [98]: from itertools import islice
In [114]: reader = iter([])

In [115]: list(islice(reader, 3))
Out[115]: []

In [116]: next(reader)
StopIteration: 

Since islice never raises a StopIteration exception, the first version of the code never terminates.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • @ubuntu - would this also imply that `islice` is not using `next` internally? – mistertee Jul 10 '17 at 11:07
  • 1
    @mistertee: The top level Python docs only specify what the behavior of the Python language should be. The implementation details depend on the flavor of Python. Conceivably, `islice` could call `next` inside a `try..except` statement, and handle any `StopIteration` exception that arises. But in the most common flavor of Python, CPython, `islice` is [implemented in C](https://github.com/python/cpython/blob/master/Modules/itertoolsmodule.c#L1502). It doesn't call the `next` (Python) function; the case of an empty iterator is handled at a lower level. – unutbu Jul 10 '17 at 11:30
  • @ubuntu - thank you again. And thanks for the link to the itertools source. Very interesting. – mistertee Jul 10 '17 at 14:19
  • sorry I misspelled your name. Must be muscle memory. – mistertee Jul 10 '17 at 15:25