4

So right now I am combining many json files, transforming the data then writing out as rows to a CSV file. The problem is that this data will grow exponentially, and I may run into memory errors in the future.

How would I go about writing to CSV, but once the file size is greater than 1GB start writing to a new file?

This is my code for writing to one file:

with open('foo.csv', 'w', encoding='utf-8', newline='') as f:
    for response_row in load_json():
        try:
            writer = csv.writer(f, delimiter="|")
            writer.writerow(row)

        except Exception as e:
            logging.critical(str(e))
user3666197
  • 1
  • 6
  • 50
  • 92
j doe
  • 233
  • 6
  • 19
  • is 1GB limit on CSV file max size? – Morse Mar 01 '18 at 20:40
  • That is what I would like yes – j doe Mar 01 '18 at 20:41
  • If its the limit then I would write into multiple files. – Morse Mar 01 '18 at 20:43
  • 2
    @Prateek that's _precisely_ what the OP is asking about. They want to automatically start writing to a new file if the current one approaches their limit. – roganjosh Mar 01 '18 at 20:51
  • Maybe something like this? https://www.blog.pythonlibrary.org/2014/02/11/python-how-to-create-rotating-logs/ – G_M Mar 01 '18 at 20:54
  • 4
    I don't think this is a duplicate of the other question as marked. For the user asking the question, if you're writing the file line-by-line, you could add up the lengths of each line and swap to a new write file when the sum of lengths reaches a threshold (each character is one byte). – ividito Mar 01 '18 at 20:55
  • @ividito isn't that what [`RotatingFileHandler`](https://docs.python.org/2/library/logging.handlers.html#logging.handlers.RotatingFileHandler) does already? – G_M Mar 01 '18 at 21:01
  • @ividito I think that is the only way to go. Thank you for your input. – j doe Mar 01 '18 at 21:01
  • @DeliriousLettuce I'll look into [RotatingFileHandler](https://docs.python.org/2/library/logging.handlers.html#logging.handlers.RotatingFileHandler) too. Seems like it could help in this case – j doe Mar 01 '18 at 21:03
  • 1
    @jdoe That first link I commented shows a usage of it as well. I might try to write something for fun with it (I've never used it before, only read about it) – G_M Mar 01 '18 at 21:03
  • @DeliriousLettuce it is a method in the `logging` library though. It seems exactly what the OP wants, but I'm not sure it's a good solution to their current setup or how to implement it cleanly with CSV – roganjosh Mar 01 '18 at 21:05
  • @DeliriousLettuce Thank you for the answer I'll give it a try, and if it doesn't work I'll manually do it as ividito suggested – j doe Mar 01 '18 at 21:05
  • 1
    @roganjosh I know, that's why it might be a fun experiment to implement it with csvs – G_M Mar 01 '18 at 21:05
  • 1
    @DeliriousLettuce I suspect it would also make a good answer at the end :) I'll have a quick scan for dupes – roganjosh Mar 01 '18 at 21:06
  • @roganjosh Well, I'm sure you'll beat me to it (you are fast!) but I might try anyways! – G_M Mar 01 '18 at 21:07
  • 1
    @DeliriousLettuce Oh no, I'm not going to try implement it myself, you can have that :) But I'm going to check whether there is a clean solution to the problem already (since, for now you can't post anything with this being tagged as a dupe). I think that tag might need to be removed. – roganjosh Mar 01 '18 at 21:09
  • 1
    I suspect that [this](https://stackoverflow.com/questions/27430555/stop-python-script-from-writing-to-file-after-it-reaches-a-certain-size-in-linux) would be a decent starting point inside a loop to open a new file – roganjosh Mar 01 '18 at 21:12
  • 1
    @martineau Based on the comment trail, I don't think the dupe target is appropriate. The OP seems to be looking for something like the [`RotatingFileHandler`](https://docs.python.org/2/library/logging.handlers.html#rotatingfilehandler) from the `logging` module but for CSV output. In other words, a continuous process. – roganjosh Mar 01 '18 at 21:39
  • It's not duplicate. The link given was splitting file..Here it's the opposite – Morse Mar 02 '18 at 00:43
  • why would writing files > 1GB cause a memory issue? You're not holding the file in memory all at once, you're just writing to the file. – Constance Mar 21 '18 at 09:12
  • @Constance It is just to future proof the script. The data will be growing exponentially. – j doe Mar 21 '18 at 16:17

0 Answers0