3

Could anyone point me towards a method of cycling a binary file in Python? I have a file full of 4 byte integers basically and when the file reaches a certain size, i.e. a certain number of values have been written, I want to start removing one from the start and adding one at the end.

I'm still reasonably new to Python, so just trying to think of a neat way of doing this.

Thanks.

Adam Cobb
  • 894
  • 4
  • 14
  • 33
  • How many is "full"? A million? A billion? Several billion? – S.Lott Oct 19 '10 at 10:12
  • 1
    I'm not sure that matters? A number that I define, when it reaches that I want it cycle. 2000 or something, why is that relevant? – Adam Cobb Oct 19 '10 at 10:40
  • Up to a billion, you can do it all in memory. After a billion, you need something more clever. The number you define matters a great deal. For 2000, do the entire thing in memory and don't think any more about it. – S.Lott Oct 19 '10 at 11:28

2 Answers2

3

My idea: the first integer in the file gives you the position of the actual beginning of the data. At the start this will be 4 (assuming an integer takes 4 bytes). When the file is full, you just start overwriting data at the beginning and increase the position integer. This is basically a simple ring-buffer in file-form.

Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
  • This would work fine, but it seems in Python you can't write to the front of a file without rewriting the whole lot - i.e. seek then write doesn't appear to work? – Adam Cobb Oct 19 '10 at 10:31
  • I really don't want to have to re-write the whole list each time either as it is vlarge! – Adam Cobb Oct 19 '10 at 10:31
  • @Adam Cobb: You'd have to make that an extra question. You may have to flush the file before seeking or something like that. – Björn Pollex Oct 19 '10 at 10:34
  • 1
    Got it working using the suggestion from this question - http://stackoverflow.com/questions/508983/how-to-overwrite-some-bytes-in-the-middle-of-a-file-with-python. Using "r+b" as the file mode. – Adam Cobb Oct 19 '10 at 10:48
3

2000 numbers?

That's 16K. Do it in memory. Indeed, by declaring your buffers to be 16K, you can probably do the entire operation in a single I/O request. And on some large 64-bit systems, 2000 numbers more-or-less is the default buffer size.

Your data volume is microscopic. Don't waste time optimizing such a minuscule amount of data.

with open( "my file.dat", "rb", 16384 ) as the_file:
    my_circular_queue = list( read_the_numbers( the_file ) )

if len(my_circular_queue) >=  2000:
    my_circular_queue = my_circular_queue[1:]
my_circular_queue.append( a_new_number )

with open( "my file.dat", "wb", 16384 ) as the_file:
    write_the_numbers( the_file, my_circular_queue )

It totally fits in memory. Don't waste time trying to finesse a complex update.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • Thanks for your answer but i've gone with Space_C0wb0y's solution, i'm sure it would be fine doing this in memory but these writes can happen quite regularly and I don't want to be loading the whole list into memory each time to write back (even if it is only 2000). – Adam Cobb Oct 22 '10 at 07:52
  • @Adam Cobb: 2000 floating-point numbers occupies 16K of memory. On some systems, the default I/O buffers are larger than this. It's a negligible, microscopic amount of data. You are wasting time trying to finesse a sophisticated I/O scheme when the amount of data is so microscopically small. – S.Lott Oct 22 '10 at 10:09