I'm writing latency-critical application (homemade HFT trading system). I have such code which just convert uint64 to string:
// TODO: cache sprintf, use strcpy? measure?
sprintf(dest, "%" PRIu64, divRes.quot);
Here divRes.quot
is integer number which guaranteed to be between 1 and 1 000 000. So I can preallocate (pretty big) array and "cache" every single value. Then I can just execute strcpy(dest, cache[divRes.quot]).
At first glance it must be significantly faster, because strcpy
must be significantly faster, than sprintf
. However note that I'm using huge array which almost surely can not be fully loaded to CPU cache. So second approach almost surely will go to main memory. While in first approach i'm pretty likely will stay in CPU cache (probably even in fastest L1 cache?!)
So in average what would be faster:
- slow function in CPU cache
- fast function with access to main memory?
I think it depends on how much faster one function than another and how much faster CPU cache access than main memory access.
I guess it's very hard to write a true test. Because in real application overall system load will be different and so cache/memory usage will be different and this can change things dramatically.
Please note I don't care about readability, maintance etc, I only need speed.