I have a file containing data that is meaningful only in chunks of certain size which is appended at the start of each chunk, for e.g.
{chunk_1_size}
{chunk_1}
{chunk_2_size}
{chunk_2}
{chunk_3_size}
{chunk_3}
{chunk_4_size}
{chunk_4}
{chunk_5_size}
{chunk_5}
.
.
{chunk_n_size}
{chunk_n}
The file is really really big ~ 2GB and the chunk size is ~20MB (which is the buffer that I want to have)
I would like to Buffer read this file to reduce the number to calls to actual hard disk.
But I am not sure how much buffer to have because the chunk size may vary.
pseudo code of what I have in mind:
while(!EOF) {
/*chunk is an integer i.e. 4 bytes*/
readChunkSize();
/*according to chunk size read the number of bytes from file*/
readChunk(chunkSize);
}
If lets say I have random buffer size then I might crawl into situations like:
- First Buffer contains chunkSize_1 + chunk_1 + partialChunk_2 --- I have to keep track of leftover and then from the next buffer get the remaning chunk and concatenate to leftover to complete the chunk
- First Buffer contains chunkSize_1 + chunk_1 + partialChunkSize_2 (chunk size is an integer i.e. 4 bytes so lets say I get only two of those from first buffer) --- I have to keep track of partialChunkSize_2 and then get remaning bytes from the next buffer to form an integer that actually gives me the next chunkSize
- Buffer might not even be able to get one whole chunk at a time -- I have to keep hitting read until the first chunk is completely read into memory