0

I've got 1.6GB available to use in a python process. I'm writing a large csv file which data is coming from a database. The problem is: After the file is written, the memory (>1.5GB) is not released immediately which causes an error in the next bit of code (allocating memory fails because the OS cannot find enough memory to allocate).

Does any function exists which would help me release that memory? Or, do you have a better way to do it?

This is the script I'm using to write the file, is writing by chunks to deal with the memory issue:

size_to_read = 20000
sqlData = rs_cursor.fetchmany(size_to_read)

c = csv.writer(open(fname_location, "wb"))
c.writerow(headers)

print("- Generating file %s ..." % out_fname)

while sqlData:
  for row in sqlData:
     c.writerow(row)
  sqlData = rs_cursor.fetchmany(size_to_read)
boardrider
  • 5,882
  • 7
  • 49
  • 86
ultraInstinct
  • 4,063
  • 10
  • 36
  • 53
  • Is the code that errors out after the last line? or is it the last line that it errors on? – Adam Aug 18 '16 at 18:47
  • No, it goes hundres of lines down which use low memory and the error comes in this line: file_obs = int(subprocess.check_output(["cat %s/%s | wc -l" % (locationToUpload, filename)], shell=True)) – ultraInstinct Aug 18 '16 at 18:54
  • I might be misunderstanding but... `sqlData = rs_cursor.fetchmany(size_to_read)` on the last line? Everything you just wrote to a file (which you haven't closed), you load it all again? – roganjosh Aug 18 '16 at 18:54
  • Your indentation is off. Please fix to match your actual code, because you have several loops and we cannot be sure when some terminate – roganjosh Aug 18 '16 at 18:55
  • It sounds like the memory may be being used for buffers on the file or objects used by the csv.writer. You could test that by adding `del c` after the code above, which should prompt Python to recover memory used by c and close the file handle which is only referenced by c. In general, it would be better to keep a handle for the file, and then explicitly close it when you're done. See http://stackoverflow.com/questions/3347775/csv-writer-not-closing-file – Matthias Fripp Aug 18 '16 at 19:57
  • @boardrider `del` is a keyword not a function – Two-Bit Alchemist Aug 19 '16 at 15:11
  • Thanks, @Two-Bit, `del` is indeed a statement. – boardrider Aug 20 '16 at 13:47

1 Answers1

2

I am thinking the issue is that you never closed the file. Give this a shot.

size_to_read = 20000
sqlData = rs_cursor.fetchmany(size_to_read)

with open(fname_location, "wb")) as f:
    c = csv.writer(f)
    c.writerow(headers)

print("- Generating file %s ..." % out_fname)

while sqlData:
    with open(fname_location, "a") as f: # "a" means to append
        c = csv.writer(f)
        for row in sqlData:
            c.writerow(row)
    sqlData = rs_cursor.fetchmany(size_to_read)

By using with you close the file automatically and releases the memory. Avoids having to explicitly call c.close()

Also I believe you can avoid a loop by like so...

while sqlData:
    with open(fname_location, "wb") as f:
        c = csv.writer(f)
        c.writerows(sqlData) # .writerows
    sqlData = rs_cursor.fetchmany(size_to_read)

Hard to replicate since I don't have the data :(

EDIT

I know this is not really an answer but check out the package memory_profiler to do a line by line assessment to see where you're using a lot of mem. https://pypi.python.org/pypi/memory_profiler

EDIT 2

Here is an example of using a generator to keep your memory usage low.

def results_iter(cursor, n=10000):
    while True:
        results = cursor.fetchmany(n)
        if not results:
            break
        for result in results:
            yield result

with open('file.csv') as f:
    c = csv.writer(f)
    for result in results_iter(rs_cursor, size_to_read)
        c.writerow(result)

via http://code.activestate.com/recipes/137270-use-generators-for-fetching-large-db-record-sets/

If any of this works let us know!

Adam
  • 3,992
  • 2
  • 19
  • 39
  • yeah I don't really know possibly like I said don't know what the data looks like. Or what he means by `size_to_read`. but thanks for the down vote! – Adam Aug 18 '16 at 19:00
  • I'd say repeatedly opening it in the event operations in between (i.e. grabbing more data) take up a ton of memory and crash the app. But yes the scope of the loops are confusing. And publishing something wrong does have value because people may pick out issues you didn't think of. – Adam Aug 18 '16 at 19:08
  • Basically, I'm using the loop because without it, the system will run out of memory whilst writing the file. So, the idea I had is to write chunks of data once all is completed. The size_to_read is the number of lines the process write for each loop. – ultraInstinct Aug 18 '16 at 19:26
  • @ebertbm yes, but your syntax in the question is invalid. Please correct the formatting of the question so that indentation matches your actual code. Only _then_ can we understand your issue. – roganjosh Aug 18 '16 at 19:27
  • 1
    @ebertbm please update on question so relevant info is centralized. also sry rogan did not see that. – Adam Aug 18 '16 at 19:33
  • Thanks. I edited the question, I didn't notice that missing indentation in the format of the question. – ultraInstinct Aug 18 '16 at 19:47
  • 1
    `c.writerows(row)` should probably be `c.writerows(sqlData)`. Also, `with csv.writer(...) as c:` may automatically close the file, but that would only be a side-effect of garbage collection when `c` gets replaced by a new `c` (at that point, the old `c` holds the last reference to the file handle, so the file handle will also get collected, which should cause it to be closed). It may be better to start with `with open(...) as f:` then `c = csv.writer(f)`. See http://stackoverflow.com/questions/3347775/csv-writer-not-closing-file – Matthias Fripp Aug 18 '16 at 20:00