2

I have this large csv file called myfile.csv. I have a function that counts the number of rows.

Then it calls another function which should do the same thing, but it doesn't output the same results.

The file:

import csv
def main():
        csvfile = open('myfile.csv', 'r')
        reader = csv.DictReader(csvfile)
        mycount = myfunc(reader)

def myfunc(reader):
    print "FIRST: " + str(reader)
    count = 0
    for row in reader:
        count = count + 1
    print "ONE: " + str(count)
    myfunc_two(reader)


def myfunc_two(reader):
    print "SECOND: " + str(reader)
    count = 0
    for row in reader:
        count = count + 1
    print "TWO: " + str(count)


if __name__ == '__main__':
        main()

The output:

FIRST: <csv.DictReader instance at 0xb7240a8c>
ONE: 180433
SECOND: <csv.DictReader instance at 0xb7240a8c>
TWO: 0

Why is the file suddenly empty in myfunc_two?

EDIT 1: Given that the read cursor is at the end of the file, myfunc_two is returning a zero.

But, since myfunc_two does not have the ability to call csvfile.seek(0), can this problem only be solved by passing csvfile to both functions?

NOTE: reader.seek(0) does not work

EDIT 2 : This code below worked while keeping the same idea that the code above was working with, it is not written as an answer because I can't offer a clear explanation to go along with it.

import csv
def main():
    file_contents = []
    with open('flights_table_all.csv', 'rb') as csvfile:
        reader = csv.DictReader(csvfile)    
        for row in reader:
            file_contents.append(row)
        leaders = myfunc(file_contents)


def myfunc(reader):
    count = 0
    for row in reader:
        count = count + 1
    print "ONE: " + str(count)
    myfunc_two(reader)


def myfunc_two(reader):
    count = 0
    for row in reader:
        count = count + 1
    print "TWO: " + str(count)


if __name__ == '__main__':
        main()
Rorschach
  • 3,684
  • 7
  • 33
  • 77
  • It's not empty. It's just that `myfunc()` leaves the file cursor at the end of the file. You can use the file's `.seek()` method to reposition the cursor to the start of the file. – PM 2Ring Dec 04 '15 at 06:27
  • reader cannot take .seek(). So, given that the function only has access to reader, can this problem not be solved, unless i pass csvfile as well? @PM2Ring – Rorschach Dec 04 '15 at 06:29
  • Do all your operations in one pass. – TigerhawkT3 Dec 04 '15 at 06:31
  • @TigerhawkT3 Do you mean use one function? – Rorschach Dec 04 '15 at 06:32
  • Sorry, I don't use the `csv` module. But I expect that you can do `csvfile.seek(0)` and then just create a new `DictReader` instance if you really do need to make a 2nd pass over the file (you might also need to explicitly `del reader` before seeking and creating a new reader). But maybe you should explain why you can't do what you need to in a single pass. – PM 2Ring Dec 04 '15 at 06:39
  • 1
    [Use `seek()` on the file, not the `reader`.](http://stackoverflow.com/questions/431752/python-csv-reader-how-do-i-return-to-the-top-of-the-file) – TigerhawkT3 Dec 04 '15 at 06:50
  • 2
    And no, I don't mean you have to use one function, I mean you should loop over the file only once. I/O is slow, so it's better to avoid reading the same file over and over again whenever possible. – TigerhawkT3 Dec 04 '15 at 06:52
  • looping once is a good suggestion. And if you want/need to go over the same file twice, I'd suggest just closing the file and reopening+reconstruct your dictreader...otherwise it's a little messy...you'd be juggling the file handle and the dictreader, and reconstructing the dictreader anyway, and passing both from one function to the other....generally bad design. – user1269942 Dec 04 '15 at 06:58
  • Ok. I've just done some tests. You don't _need_ to create a new `DictReader` instance: you can just do `csvfile.seek(0)`, but if your CSV contains a header row you will also need to do `next(csvfile)` after seeking to skip the header line. BTW, when using the `csv` module you should _always_ open your CSV files in binary mode, i.e., `csvfile = open('myfile.csv', 'rb')` or even better: `with open(fname, 'rb') as csvfile:`. – PM 2Ring Dec 04 '15 at 07:19
  • @PM2Ring would it be possible to set some object/variable as equal to the contents of the file and pass that around between functions, thereby doing just one pass on the IO? – Rorschach Dec 04 '15 at 08:03
  • Sure. It's hard to tell from your code exactly what you want to do with the data, but a typical approach would be to do `rows = list(reader)` which creates a `list` of `dict`s. – PM 2Ring Dec 04 '15 at 08:12
  • FWIW, a list is more efficient than a dict with consecutive numeric keys starting from zero. – PM 2Ring Dec 04 '15 at 08:31
  • Thanks for the heads up @PM2Ring ill change it. – Rorschach Dec 04 '15 at 08:35

0 Answers0