0

I wrote a class to deal with large files and I want to make a "write" method for the class so that I can easily make changes to the data in the file and then write out a new file.

What I want to be able do is:

1.) Read in the original file

sources = Catalog(<filename>)

2.) Make changes on the data contained in the file

for source in sources:
    source['blah1'] = source['blah1'] + 4

3.) Write out the updated value to a new file

sources.catalog_write(<new_filename>)

To this end I wrote a fairly straightforward generator,

class Catalog(object):
    def __init__(self, fname):
        self.data = open(fname, 'r')

        self.header = ['blah1', 'blah2', 'blah3']

    def next(self):
        line = self.data.readline()
        line = line.lstrip()
        if line == "":
            self.data.close()
            raise StopIteration()

        cols = line.split()
        if len(cols) != len(self.header):
            print "Input catalog is not valid."
            raise StopIteration()

        for element, col in zip(self.header, cols):
            self.__dict__.update({element:float(col)})

        return self.__dict__.copy()

    def __iter__(self):
        return self

This is my attempt at a write method:

def catalog_write(self, outname):
    with open(outname, "w") as out:
        out.write("    ".join(self.header) + "\n")
        for source in self:
            out.write("    ".join(map(str, source)) + "\n")

But I get the following error when I try to call that class method,

 File "/Catalogs.py", line 53, in catalog_write
    for source in self:
  File "/Catalogs.py", line 27, in next
    line = self.data.readline()
ValueError: I/O operation on closed file

I realize that this is because generators are generally a one time deal but I know that there are workarounds to this (like this question and this post but I'm not sure what the best way to do this is. These files are quite large and I'd like their read in and use to be as efficient as possible (both time-wise and memory-wise). Is there a pythonic way to do this?

Community
  • 1
  • 1
Dex
  • 345
  • 4
  • 13
  • 1
    Can you provide the full stack trace? – Tom Dalton Jul 31 '14 at 21:09
  • 1
    where does data come from in `self.data.readline()`? – Padraic Cunningham Jul 31 '14 at 21:30
  • @Alexa Based on your updated question, does my answer solve your issue ? – Raghav RV Aug 01 '14 at 00:44
  • @rvraghav93 That does solve the I/O error although I don't quite understand why. Would you mind explaining more? – Dex Aug 01 '14 at 01:34
  • I/O Error was due to the fact that your self.data was closed at the end of one Iteration. At the next iteration, you were trying to read from that closed file, hence the I/O error. Now we have modified it to reopen the file at the start of the next iteration ! :) – Raghav RV Aug 03 '14 at 08:44

1 Answers1

1

Assumptions made:

Input File: [ infile ]

1.2 3.4 5.6
4.5 6.7 8.9

Usage:

>>> a = Catalog('infile')
>>> a.catalog_write('outfile')

Now Output File: [ outfile ]

blah1 blah2 blah3
1.2 3.4 5.6
4.5 6.7 8.9

Writing it again to another file: [ outfile2 ]

>>> a.catalog_write('outfile2')

Now Output File: [ outfile2 ]

blah1 blah2 blah3
1.2 3.4 5.6
4.5 6.7 8.9

So from what you have posted, looks like you need to reopen your data [ Assuming it is the file object with file name as self.fname ].

Modify your __init__ to save the fname as an attribute

Create a data object initially [ I am not opening it at __init__ stage, so that you could open and close when needed all inside your next() method ] I have just created the data as an object so that it can have an attribute closed like a file object, so that you could check whether self.data.closed is True and reopen the same from inside your next() method and read from the same.

def __init__(self, fname):
    self.fname = fname
    self.data = object()
    self.data = lambda: None
    self.data.closed = True
    self.header = ['blah1', 'blah2', 'blah3']

Now the next method is modified as follows :

def next(self):
    if self.data.closed:
        self.data = open(self.fname, "r")
    line = self.data.readline()
    line = line.lstrip()
    if line == "":
        if not self.data.closed:
            self.data.close()
        raise StopIteration()

    cols = line.split()
    if len(cols) != len(self.header):
        print "Input catalog is not valid."
        if not self.data.closed:
            self.data.close()
        raise StopIteration()

    for element, col in zip(self.header, cols):
        self.__dict__.update({element:float(col)})

    return self.__dict__.copy()

Your catalog_write method should be as follows :

Note that any modifications to data must be done within the for loop as shown.

def catalog_write(self, outname):
    with open(outname, "w") as out:
        out.write("    ".join(self.header) + "\n")
        for source in self:
            source['blah1'] = 444 # Data modified.
            out.write("    ".join(map(str, [source[self.header[i]] for i in range(len(self.header)) ])) + "\n")   

I assumed that you want the updated values of the headers written as a column in the outname file.

Raghav RV
  • 3,938
  • 2
  • 22
  • 27
  • how are you supposed to specify the file to write to? – Padraic Cunningham Jul 31 '14 at 21:38
  • @PadraicCunningham isn't that the one initiated to as `fname` attribute ? – Raghav RV Jul 31 '14 at 21:40
  • no, the op seems to want to take in a file and write to another, why would you use `self.fname` and refer to it later as `self.data`? – Padraic Cunningham Jul 31 '14 at 21:41
  • Oh ! looks like I understood it wrong ... thanks for pointing out ! – Raghav RV Jul 31 '14 at 21:42
  • no worries, I think the OP needs to supply the actual code they are using as what they have posted does not match the error. – Padraic Cunningham Jul 31 '14 at 21:43
  • Thanks for the response. Is there anyway to short circuit the next method under a certain condition? Some of the lines in the files are just comment lines, denoted with a '#' that I'd like to be able to skip over. Is there anyway to do `if line[0] == '#': something to skip to next line in file`? – Dex Aug 01 '14 at 02:34