I'm currently working on my thesis project designing a cache implementation to be used with a shortest-path graph algorithm. The graph algorithm is rather inconsistent with runtimes, so it's too troublesome to benchmark the entire algorithm. I must concentrate on benchmarking the cache only.
The caches I need to benchmark are about a dozen or so implementations of a Map
interface. These caches are designed to work well with a given access pattern (the order in which keys are queried from the algorithm above). However, in a given run of a "small" problem, there are a few hundred billion queries. I need to run almost all of them to be confident about the results of the benchmark.
I'm having conceptual problems about loading the data into memory. It's possible to create a query log, which would just be an on-disk ordered list of all the keys (they're 10-character string identifiers) that were queried in one run of the algorithm. This file is huge. The other thought I had would be to break the log up into chunks of 1-5 million queries, and benchmark in the following manner:
- Load 1-5 million keys
- Set start time to current time
- Query them in order
- Record elapsed time (current time - start time)
I'm unsure of what effects this will have with caching. How could I perform a warm-up period? Loading the file will likely clear any data that was in L1 or L2 caches for the last chunk. Also, what effect does maintaining a 1-5 million element string array have (and does even iterating it skew results)?
Keep in mind that the access pattern is important! For example, there are some hash tables with move-to-front heuristics, which re-orders the internal structure of the table. It would be incorrect to run a single chunk multiple times, or run chunks out of order. This makes warming CPU caches and HotSpot a bit more difficult (I could also keep a secondary dummy cache that's used for warming but not timing).
What are good practices for microbenchmarks with a giant dataset?