1

I am trying to write a byte array at the beginning of a file and at a (much) later point I want to split them again, an retrieve the original file. the byte_array is just a small jpeg.

# write a byte array at the beginning of a file
def write_byte_array_to_beginning_of_file( byte_array, file_path, out_file_path ):
    with open( file_path, "rb" ) as f:
        with open( out_file_path, "wb" ) as f2:
            f2.write( byte_array )
            f2.write( f.read( ) )

while the function works, it hogs a lot of memory. It seems like it reads the files to memory first befor doing something. There are some files in excess of 40gb that i need to work on, and it's only done on a small NAS with 8Gb of RAM.

What would be a memory conscious to achieve this?

martineau
  • 119,623
  • 25
  • 170
  • 301
globus243
  • 710
  • 1
  • 15
  • 31
  • Prepending to a 40gb file is generally going to be very slow. Ideally append, but if that's not an option then you could leave enough space at the start of the file so you can overwrite the blank section of bytes without actually changing its length. As another alternative (and this is filesystem specific) you can potentially add file sectors to the start of a file instead and write to just those, leaving the rest of the file untouched. – Luke Briggs Jul 31 '22 at 00:50
  • Seems like a bad idea. Why do you want to do this? – Kelly Bundy Jul 31 '22 at 01:04

1 Answers1

4

You can read from the original file in chunks instead of reading the whole thing.

def write_byte_array_to_beginning_of_file( byte_array, file_path, out_file_path, chunksize = 10 * 1024 * 1024 ):
    with open( file_path, "rb" ) as f, open( out_file_path, "wb" ) as f2:
        f2.write( byte_array )
        while True:
            block = f.read(chunksize)
            if not block:
                break
            f2.write(block)

This reads it in chunks of 10 MB by default, which you can override.

Barmar
  • 741,623
  • 53
  • 500
  • 612