This is a classic time-space tradeoff. Allocating lots of small blocks is likely to be less efficient than one big block, assuming you need the entire contents.
Ideally, the file format should encode metadata such as the size of blocks, the count of chunks, and so on. Given the latency of disk access compared to the speed of memory, reading through the file to determine the required size would likely take longer.
The most efficient approach also depends on how much processing is required. You mention parsing, but it is a binary file. Presumably there are many chunks and variable-sized structures you need to traverse?
There are a few strategies you can try:
If the files are not too large to fit in memory, you could query the filesystem to see how big the file is, read it in as one big chunk, then pull it apart in memory. This would be very fast, but use lots of memory.
Depending on the structure of the binary file, you might be able to do a few fseek()
calls to figure out how big the chunks you need to read are (if you don't need the entire file) and just read those.
You could use mmap()
to map the file into memory and let the runtime manage paging of the data into memory.