1

I want to see if there is a more efficient way of writing over a million rows of data to a text file. I'm currently calling write over a million times which I believe is costly. I'm looping through each item in a huge s3 bucket to write each object id + some metadata to a text file one by one. Should I be looping through the bucket first, storing in a list or dict and then writing that entire list/dict at once? Or do it one by one?

list_of_million = [1,2,3,4,5......]

with open("Output.txt", "wb") as text_file:

    for data in list_of_milion:

        text_file.write(data)
franchyze923
  • 1,060
  • 2
  • 12
  • 38
  • you are using a buffered file object, so, you aren't actually incurring the cost of writing 1 million times. You can try passing a large number for the buffering parameter, if RAM isn't an issue. – juanpa.arrivillaga Sep 19 '19 at 22:52
  • 1
    Possible duplicate of [Fastest way to write huge data in file](https://stackoverflow.com/questions/27384093/fastest-way-to-write-huge-data-in-file) – Kevin Welch Sep 19 '19 at 22:53
  • @juanpa.arrivillaga can you explain a little further? What do you mean by buffered file object? – franchyze923 Sep 19 '19 at 22:56
  • the object returned by `open("Output.txt", "wb")`. In any case, it is using a heuristic to arrive at a reasonable buffering size under the hood (the page size of your system). You can try making a large buffer, but ultimately, you should empirically see how the buffering is affecting runtime, because it depends on the particulars of your system/hardware – juanpa.arrivillaga Sep 19 '19 at 22:57
  • a) What are you optimizing for? Memory use? Execution time? b) Why is this tagged csv? c) are you writing `"data"` or `data` d) Why are you doing this, what's an example. You just need to be a ton more specific about what you're doing. –  Sep 19 '19 at 23:20
  • Sorry - some typos on my part. Should not have put csv tag and I’m writing data. I’m doing this to get an inventory of items in an s3 bucket as well as some associated metadata with each item. I want to write all this to a text file. Optimizing for execution time. But I don’t have unlimited memory. About 32 gb available – franchyze923 Sep 19 '19 at 23:24
  • @WarpDriveEnterprises Forgot to tag – franchyze923 Sep 19 '19 at 23:35
  • S3 has an inventory feature which will give a list of all objects in a bucket. It puts the inventory lists in a bucket of your choice. – cementblocks Sep 20 '19 at 00:46
  • @cementblocks I was not aware. Am I also able to grab metadata from each object? – franchyze923 Sep 20 '19 at 01:45
  • Unfortunately no. – cementblocks Sep 20 '19 at 13:39

0 Answers0