How to efficiently write over a million rows to a text file in Python

Question

I want to see if there is a more efficient way of writing over a million rows of data to a text file. I'm currently calling write over a million times which I believe is costly. I'm looping through each item in a huge s3 bucket to write each object id + some metadata to a text file one by one. Should I be looping through the bucket first, storing in a list or dict and then writing that entire list/dict at once? Or do it one by one?

list_of_million = [1,2,3,4,5......]

with open("Output.txt", "wb") as text_file:

    for data in list_of_milion:

        text_file.write(data)

you are using a buffered file object, so, you aren't actually incurring the cost of writing 1 million times. You can try passing a large number for the buffering parameter, if RAM isn't an issue. — juanpa.arrivillaga, Sep 19 '19 at 22:52
Possible duplicate of [Fastest way to write huge data in file](https://stackoverflow.com/questions/27384093/fastest-way-to-write-huge-data-in-file) — Kevin Welch, Sep 19 '19 at 22:53
@juanpa.arrivillaga can you explain a little further? What do you mean by buffered file object? — franchyze923, Sep 19 '19 at 22:56
the object returned by `open("Output.txt", "wb")`. In any case, it is using a heuristic to arrive at a reasonable buffering size under the hood (the page size of your system). You can try making a large buffer, but ultimately, you should empirically see how the buffering is affecting runtime, because it depends on the particulars of your system/hardware — juanpa.arrivillaga, Sep 19 '19 at 22:57
a) What are you optimizing for? Memory use? Execution time? b) Why is this tagged csv? c) are you writing `"data"` or `data` d) Why are you doing this, what's an example. You just need to be a ton more specific about what you're doing. — , Sep 19 '19 at 23:20
Sorry - some typos on my part. Should not have put csv tag and I’m writing data. I’m doing this to get an inventory of items in an s3 bucket as well as some associated metadata with each item. I want to write all this to a text file. Optimizing for execution time. But I don’t have unlimited memory. About 32 gb available — franchyze923, Sep 19 '19 at 23:24
S3 has an inventory feature which will give a list of all objects in a bucket. It puts the inventory lists in a bucket of your choice. — cementblocks, Sep 20 '19 at 00:46
@cementblocks I was not aware. Am I also able to grab metadata from each object? — franchyze923, Sep 20 '19 at 01:45

How to efficiently write over a million rows to a text file in Python

0 Answers0