Why does iteration over rows delete data in csv.reader and csv.DictReader?

Question

Create any nonempty csv file and call this test.csv. Consider the code

import csv 

with open("test.csv") as read_file:
     #this test case also applies to csv.reader()
     check_file = csv.DictReader(read_file)
     
     #1) with a nonempty csv file, this will return a nonempty output
     for row in check_file:
         print(row)

     #2) this will not return any output
     for row in check_file:
         print(row)

In other words, what has happened is that the iteration over rows of check_file has deleted all data in check_file, such that 1) returns nonempty output but the exact same function 2) returns no output at all.

There is a simple, but inelegant, solution:

import csv 

with open("test.csv") as read_file:
     #this test case also applies to csv.reader()
     check_file = csv.DictReader(read_file)
     
     #1) with a nonempty csv file, this will return a nonempty output
     for row in check_file:
         print(row)

with open("test.csv") as read_file:
     check_file = csv.DictReader(read_file)

     #2) this will return the same output as 1)
     for row in check_file:
         print(row)

What is the explanation for this odd behaviour?

In the first one it has not "deleted data" because the data was never in memory to start with; rather, it has iterated to exhaustion over the underlying file object. In the second case, you create a new file handle positioned at the start again. — alani, Jul 22 '20 at 22:03
@alaniwi I see, so the reader/DictReader object does not load the csv into memory, but rather accesses it one row at a time? — pythonuser, Jul 22 '20 at 22:12
It is a wrapper for the underlying file object, which similarly accesses the file one line at a time. (It may actually use a fixed-sized buffer in memory to minimize requests to read from the disk, but you don't have access to such a buffer from Python.) — Karl Knechtel, Jul 22 '20 at 22:22

alani · Accepted Answer · 2020-07-22T22:29:16.020

The csv.DictReader does not read the whole data into memory, rather it acts as an iterator which consumes lines from read_file on demand, and the file object read_file in turn will read lines from the file on demand. By the time that the first loop has finished, the file pointer is positioned at end of file, and iterating a second time will not get any more rows. However, if you rewind the file pointer to the end of the first line (the same as where it will be after instantiating csv.DictReader and it has read in the header row) then you can iterate again using the existing objects, without having to reopen the file and create a new DictReader object.

import csv 

with open("my.csv") as read_file:
    check_file = csv.DictReader(read_file)
     
    #1) with a nonempty csv file, this will return a nonempty output
    for row in check_file:
        print(row)

    read_file.seek(0)  # <==== back to the start
    next(read_file)  # <==== discard the header row
         
    #2) this will now give you output again...
    for row in check_file:
        print(row)

A similar consideration also applies to csv.reader(), although in that case, if you want to repeat the same output again, you would simply go back to the start, without afterwards skipping over the header row.

Why does iteration over rows delete data in csv.reader and csv.DictReader?

1 Answers1