2

I have a big binary file (60GB) that I want to split into several smaller. I iterated over the file and found the points at which I want to split the file using fileObject.tell() method, so now I have an array of 1000 split points called file_pointers. I am looking for a way to create files out of those split points, so the function would look like:

def split_file(file_object, file_pointers):
     # Do something here

and it would create files for every chunk. I saw this question, but I am afraid Python's looping could be too slow, and I also feel like there must be some kind of a built-in function that should something similar.

Hristo Vrigazov
  • 1,357
  • 2
  • 12
  • 20

1 Answers1

2

This is a lot simpler than I thought, but I will post my answer in here just in case anyone wants a quick solution. Here is an example of copying from file_pointer[1] to file_pointer[2]

with open('train_example.bson', 'rb') as fbson:
    fbson.seek(file_pointers[1])
    bytes_chunk = fbson.read(file_pointers[2] - file_pointers[1])
    with open('tmp.bson', 'wb') as output_file:
        output_file.write(bytes_chunk)
Hristo Vrigazov
  • 1,357
  • 2
  • 12
  • 20