-1

I am getting a memory error every time I am trying to write into csv. So the first 5 GB of Data works fine but then I get an Memory error.

I dont know why because I am trying to clear every time my elem from the memory so it should not happen.

def writeDataCSV(file):
    try:
        with open('Data/csv/'+file+'.csv','w') as fp:
            for evt, elem in iterparse('dumpData/'+str(filename)+'.xml', events=('end',)):
                if elem.tag == 'row':
                    element_fields = elem.attrib
                    data = []

                    if(file== "Comments"):
                        data = commentsXML(element_fields)
                        wr = csv.writer(fp, dialect='excel')
                        wr.writerow(data)
                        elem.clear()
        fp.close
    except UnicodeEncodeError as uniError:
        print(uniError)
    try:
        if(file== "Comments"):
            df = pd.read_csv('Data/csv/Comments.csv', names=["Id","PostId","Score","Text","Date","Time","UserID"])
            df.to_csv("Data/csv/Comments.csv")

     except UnicodeDecodeError as uniDeError:
        print(uniDeError)

MemoryError

  • I'm curious what happens when you close down your `try`/`except` to the part of your code that you think will fail. Currently you have 1 that covers the entire procedure, but it might be only a single line that fails, if any. – roganjosh May 29 '18 at 22:06
  • 1
    Does `file== "Comments"` when you are running this? Because it looks like you only clear if that's the case (and when `elem.tag == 'row'`). If you're trying to read a huge XML file you might want to start by commenting out some of the CSV-related code to narrow down where the problem is occurring. – J. Owens May 29 '18 at 22:29
  • `wr = csv.writer` is never closed (or flushed), esp. if you take an exception. It's always better to open files with a `with ...` statement. `fp.close` does nothing, it does not do `fp.close()`, but in any case that should happen automatically due to the `with open(...) as fp`, but again if you take an exception it might not. And where is handling for other exceptions like `MemoryError`? Move your `with` statement above the try-except ladder, and make sure to handle other exceptions. – smci May 29 '18 at 23:08
  • Anyway **it's mistaken to read in 5Gb and write it out, without chunking**. The memory usage will be insane, and if you take an exception you lose everything. What is `commentsXML`, where does it come from, how large is it, and what does `commentsXML(element_fields)` do, is it a lookup? – smci May 29 '18 at 23:09
  • 1
    **This is a near-duplicate, there are lots of examples on how to write CSV files in chunks, please pick one and close this: [How do you split reading a large csv file into evenly-sized chunks in Python?](https://stackoverflow.com/questions/4956984/how-do-you-split-reading-a-large-csv-file-into-evenly-sized-chunks-in-python), [How to read a 6 GB csv file with pandas](https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas), [Read, format, then write large CSV files](https://stackoverflow.com/questions/44122943/read-format-then-write-large-csv-files)...** – smci May 29 '18 at 23:15
  • 1
    There are 132 hits for [\[python\] write csv chunk](https://stackoverflow.com/search?q=%5Bpython%5D+write+csv+chunk), so this is a duplicate. – smci May 29 '18 at 23:16
  • [Lazy Method for Reading Big File in Python?](https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python), [Process data, much larger than physical memory](https://stackoverflow.com/questions/17710748/process-data-much-larger-than-physical-memory/)... – smci May 29 '18 at 23:20
  • You got "MemoryError" and thought that copying the exception text verbatim into the question couldn't help us?! – Antti Haapala -- Слава Україні May 29 '18 at 23:37
  • @smci Do we have any problem? Like spamming around doesnt help the community. – HanahDevelope May 30 '18 at 12:18
  • @AnttiHaapala I am getting the exception in writeDataCSV – HanahDevelope May 30 '18 at 13:20
  • @J.Owens I am trying to write first to row out and then clear the memory (list of comment) – HanahDevelope May 30 '18 at 13:21
  • @roganjosh I can remove it. I got UnicodeException but I fix it – HanahDevelope May 30 '18 at 13:22

1 Answers1

0

A bit too much resposibilities inside your function, difficlt to read, hard to debug, generally, not an example to follow.

My best guess to avoid memory error is to separate a reading and writing part of the code to own functions, in a style of:

import csv

# FIXME: iterparse, commentsXML are some global functions

def get_data(filename):
    for evt, elem in iterparse('dumpData/'+str(filename)+'.xml', events=('end',)):
        if elem.tag == 'row':
            yield commentsXML(elem.attrib)

def save_stream_to_csv_file(gen, target_csv_filename):
    with open('Data/csv/'+target_csv_filename+'.csv','w') as fp:
        wr = csv.writer(fp, dialect='excel')
        for data in gen:
           wr.writerow(data)

gen = get_data('your_source_filename')
save_stream_to_csv_file(gen, 'your_target_filename')

# WONTFIX: 'dumpData/'+str(filename)+'.xml' and 
#          'Data/csv/'+target_csv_filename+'.csv' are a bit ugly  
#           os.join() and .format() highly welcome
Evgeny
  • 4,173
  • 2
  • 19
  • 39