I am implementing a range search code in c++ where I have a big file (20GB) and I need to search for a specific range for different queries.
I have divided the big file into smaller chunks to fasten the search where I have two levels a root and leaves and the data is stored in the leaves (following the same idea of an ISAM tree).
i.e:
I have 3000 000 000 lines of data
Divided into 30000 pages each page with 100000 line
A root that pints to each page (the root has 30000).
However, I noticed that once the search range starts at page 200 or higher the stream becomes significantly slower. I close each page after I am finished with it. So is there any reason why the reading stream becomes very slow?
- I am running on a linux machine
- I don't have the option of performing multi-threading
- The reads are sequential from these files.