Say I have a data file of size 5GB in the disk, and I want to append another set of data of size 100MB at the end of the file -- Just simply append, I don't want to modify nor move the original data in the file. I know I can read the hole file into memory as a long long list and append my small new data to it, but it's too slow. How I can do this more efficiently? I mean, without reading the hole file into memory?
I have a script that generates a large stream of data, say 5GB, as a long long list, and I need to save these data into a file. I tried to generate the list first and then output them all in once, but as the list increased, the computer got slow down very very severely. So I decided to output them by several times: each time I have a list of 100MB, then output them and clear the list. (this is why I have the first problem) I have no idea how to do this.Is there any lib or function that can do this?
-
You want to append to the existing file. See [this answer](http://stackoverflow.com/a/1466036/4014959) for a summary of the standard file modes. – PM 2Ring Sep 27 '16 at 14:11
-
for part 1, just open your file in append mode. for part 2, please show us your code so we can help you fixing it. – Jean-François Fabre Sep 27 '16 at 14:12
1 Answers
Let's start from the second point: if the list you store in memory is larger than the available ram, the computer starts using the hd as ram and this severely slow down everything. The optimal way of outputting in your situation is fill the ram as much as possible (always keeping enough space for the rest of the software running on your pc) and then writing on a file all in once.
The fastest way to store a list in a file would be using pickle
so that you store binary data that take much less space than formatted ones (so even the read/write process is much much faster).
When you write to a file, you should keep the file always open, using something like with open('namefile', 'w') as f
. In this way, you save the time to open/close the file and the cursor will always be at the end. If you decide to do that, use f.flush()
once you have written the file to avoid loosing data if something bad happen. The append
method is good alternative anyway.
If you provide some code it would be easier to help you...

- 1,943
- 1
- 13
- 25
-
1Don't advice to use pickle for speed, that's not the proper tool. If you want speed for structured data just use binary io but not pickle. – Serge Ballesta Sep 27 '16 at 14:45