0

What is the 'Python way' regarding working with a CSV file? If I want to run some methods on the data in a particular column, should copy the whole think into an array, or should I pass the open file into a series of methods?

I tried to return the open file and got this error:

ValueError: I/O operation on closed file

here's the code:

import sys
import os
import csv

def main():
pass


def openCSVFile(CSVFile, openMode):
with open(CSVFile, openMode) as csvfile:
    zipreader = csv.reader(csvfile, delimiter=',')
return zipreader

if __name__ == '__main__':

    zipfile = openCSVFile('propertyOutput.csv','rb')
    numRows = sum(1 for row in zipfile)
    print"Rows equals %d." % numRows
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
DBWeinstein
  • 8,605
  • 31
  • 73
  • 118
  • Are you using the csv module (`import csv`)? – John1024 Nov 30 '13 at 21:45
  • yes I am using import csv. – DBWeinstein Nov 30 '13 at 21:47
  • You Q seems either non providing code to reproduce error (if it's about error) or too broad (if it's about what method is better and when), both options being offtopic for SO. Can you either elaborate your code, to propose a solution for this case, or elaborate conditions for your task (file size, types of operations etc.) to make Q more specific. – alko Nov 30 '13 at 21:51
  • your code is malformed because it is indented with tab instead of 4-spaces, can you please fix it – alko Nov 30 '13 at 22:26
  • Possible duplicate of [How do I read and write CSV files with Python?](http://stackoverflow.com/questions/41585078/how-do-i-read-and-write-csv-files-with-python) – Martin Thoma Feb 05 '17 at 17:01

4 Answers4

2

Well there are many ways you could go about manipulating csv files. It depends largely on how big your data is and how often you will perform these operations.

I will build on the already good answers and comments to present a somewhat more complex handling, that wouldn't be far off from a real world example.

First of all, I prefer csv.DictReader because most csv files have a header row with the column names. csv.DictReader takes advantage of that and gives you the opportunity to grab it's cell value by its name.

Also, most of the times you need to perform various validation and normalization operations on said data, so we're going to associate some functions with specific columns.

Suppose we have a csv with information about products. e.g.

Product Name,Release Date,Price
foo product,2012/03/23,99.9
awesome product,2013/10/14,40.5
.... and so on ........

Let's write a program to parse it and normalize the values into appropriate native python objects.

import csv
import datetime
from decimal import Decimal

def stripper(value):
    # Strip any whitespace from the left and right
    return value.strip()

def to_decimal(value):
    return Decimal(value)

def to_date(value):
    # We expect dates like: "2013/05/23"
    datetime.datetime.strptime(value, '%Y/%m/%d').date()

OPERATIONS = {
    'Product Name': [stripper],
    'Release Date': [stripper, to_date],
    'Price': [stripper, to_decimal]
}

def parse_csv(filepath):
    with open(filepath, 'r') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            for column in row:
                operations = OPERATIONS[column]
                value = row[column]
                for op in operations:
                    value = op(value)
                # Print the cleaned value or store it somewhere
                print value

Things to note:

1) We operate on the csv in a line by line basis. DictReader yields lines one at a time and that means we can handle arbitrary sizes of csv files, since we are not going to load the whole file into memory.

2) You can go crazy with normalizing the values of a csv, by building special classes with magic methods or whatnot. As I said, it depends on the complexity of your files, the quality of the data and the operations you need to perform on them.

Have fun.

rantanplan
  • 7,283
  • 1
  • 24
  • 45
0

csv module provides one row at a time, understanding its content by spliting it as a list object (or dict in case of DictReader).

As Python knows how to loop on such an object, if you're just interested in some specific fields, building a list with these fields seems 'Pythonic' to me. Using an iterator is also valid if each item shall be considered separatly from the others.

Joël
  • 2,723
  • 18
  • 36
0

You probably need to read PEP 343: The 'with' statement

Relevant quote:

Some standard Python objects now support the context management protocol and can be used with the 'with' statement. File objects are one example:

with open('/etc/passwd', 'r') as f:
    for line in f:
        print line
        ... more processing code ...

After this statement has executed, the file object in f will have been automatically closed, even if the 'for' loop raised an exception part-way through the block.

So your csvfile is closed outside with statement, and outside openCSVFile function. You need to not to use with statement,

def openCSVFile(CSVFile, openMode):
    csvfile = open(CSVFile, openMode)
    return csv.reader(csvfile, delimiter=',')

or move it to __main__:

def get_csv_reader(filelike):
    return csv.reader(csvfile, delimiter=',')

if __name__ == '__main__':
    with open('propertyOutput.csv', 'rb') as csvfile:
        zipfile = get_csv_reader(csvfile)
        numRows = sum(1 for row in zipfile)
        print"Rows equals %d." % numRows
Community
  • 1
  • 1
alko
  • 46,136
  • 12
  • 94
  • 102
0

Firstly, the reason you're getting ValueError: I/O operation on closed file is that in the following, the with acting as a context manager is operating on an opened file which is the underlying fileobj that zipreader is then set to work on. What happens, is that as soon as the with block is exited, the file that was opened is then closed, which leaves the file unusable for zipreader to read from...

with open(CSVFile, openMode) as csvfile:
    zipreader = csv.reader(csvfile, delimiter=',')
return zipreader

Generally, acquire the resource and then pass it a function if needed. So, in your main program open the file and create the csv.reader and then pass that to something and have it closed in the main program when it makes more sense that "you're done with it now".

Jon Clements
  • 138,671
  • 33
  • 247
  • 280