Cycling a binary file in Python - Remove from beginning add to end

Question

Could anyone point me towards a method of cycling a binary file in Python? I have a file full of 4 byte integers basically and when the file reaches a certain size, i.e. a certain number of values have been written, I want to start removing one from the start and adding one at the end.

I'm still reasonably new to Python, so just trying to think of a neat way of doing this.

Thanks.

I'm not sure that matters? A number that I define, when it reaches that I want it cycle. 2000 or something, why is that relevant? — Adam Cobb, Oct 19 '10 at 10:40
Up to a billion, you can do it all in memory. After a billion, you need something more clever. The number you define matters a great deal. For 2000, do the entire thing in memory and don't think any more about it. — S.Lott, Oct 19 '10 at 11:28

score 3 · Accepted Answer · answered Oct 19 '10 at 09:31

3

My idea: the first integer in the file gives you the position of the actual beginning of the data. At the start this will be 4 (assuming an integer takes 4 bytes). When the file is full, you just start overwriting data at the beginning and increase the position integer. This is basically a simple ring-buffer in file-form.

answered Oct 19 '10 at 09:31

Björn Pollex

75,346
28
201
283

This would work fine, but it seems in Python you can't write to the front of a file without rewriting the whole lot - i.e. seek then write doesn't appear to work? – Adam Cobb Oct 19 '10 at 10:31
I really don't want to have to re-write the whole list each time either as it is vlarge! – Adam Cobb Oct 19 '10 at 10:31
@Adam Cobb: You'd have to make that an extra question. You may have to flush the file before seeking or something like that. – Björn Pollex Oct 19 '10 at 10:34
1

Got it working using the suggestion from this question - http://stackoverflow.com/questions/508983/how-to-overwrite-some-bytes-in-the-middle-of-a-file-with-python. Using "r+b" as the file mode. – Adam Cobb Oct 19 '10 at 10:48

S.Lott · Answer 2 · 2010-10-22T10:11:16.253

3

2000 numbers?

That's 16K. Do it in memory. Indeed, by declaring your buffers to be 16K, you can probably do the entire operation in a single I/O request. And on some large 64-bit systems, 2000 numbers more-or-less is the default buffer size.

Your data volume is microscopic. Don't waste time optimizing such a minuscule amount of data.

with open( "my file.dat", "rb", 16384 ) as the_file:
    my_circular_queue = list( read_the_numbers( the_file ) )

if len(my_circular_queue) >=  2000:
    my_circular_queue = my_circular_queue[1:]
my_circular_queue.append( a_new_number )

with open( "my file.dat", "wb", 16384 ) as the_file:
    write_the_numbers( the_file, my_circular_queue )

It totally fits in memory. Don't waste time trying to finesse a complex update.

edited Oct 22 '10 at 10:11

answered Oct 19 '10 at 11:31

S.Lott

384,516
81
508
779

Thanks for your answer but i've gone with Space_C0wb0y's solution, i'm sure it would be fine doing this in memory but these writes can happen quite regularly and I don't want to be loading the whole list into memory each time to write back (even if it is only 2000). – Adam Cobb Oct 22 '10 at 07:52
@Adam Cobb: 2000 floating-point numbers occupies 16K of memory. On some systems, the default I/O buffers are larger than this. It's a negligible, microscopic amount of data. You are wasting time trying to finesse a sophisticated I/O scheme when the amount of data is so microscopically small. – S.Lott Oct 22 '10 at 10:09

Cycling a binary file in Python - Remove from beginning add to end

2 Answers2