How to account for cache misses in estimating performance?

Question

Generally performance is given in terms of O() Order of magnitude: O(Magnitude)+K where the K is generally ignored as it applies mainly to smaller Ns.

But more and more I have seen performance dominated by underlying data size, but this is not part of algorithmic complexity

Assuming algorithm A is O(logN) but uses O(N) space and algorithm B is O(N) but uses O(logN) It used to be the case that algorithm A was faster. Now with cache misses in multi-tiered caches, it is likely that algorithm B will be faster for large numbers and possibly small numbers if it has a smaller K

The problem is how do you represent this?

I think if I just came out of a computer science class, my head would be full of big-O and cache misses, but that's because it's hard to find a computer science teacher who has had to work on the performance of *massive* software. Here's [*a counterexample*](http://stackoverflow.com/a/927773/23771) and, as @amit says, deal with it empirically. — Mike Dunlavey, Aug 21 '15 at 17:21
If you want a good estimation of performance you need to run your algorithm(s) on the specific hardware you want the estimate for. Your post as a lot of generalities, what algorithm's specifically have you seen run worse with an O(logN) implementation than a O(N) implementation? What sizes were the data set and what hardware were you running on. You can't account for cache misses without specifying the processor/ram/architecture. — Louis Ricci, Aug 21 '15 at 19:51

score 1 · Answer 1 · edited Apr 13 '17 at 12:19

cache misses do not take into account in big O notation since they are constant factors.

Even if you pessimistically assume every array seek is going to be a cache miss, and let's say a cache miss takes 100 cycles (this time is constant, since we are assuming Random Access Memory), than iterating the array of length n is going to take 100*n cycles for the cache misses (+ overhead for loop and control), and in general terms it remains O(n).

One reason big O is used so often is because it is platform independent (well, when speaking about RAM machines at least). If we would have took cache misses into account, the result would have been different for each platform.

If you are looking for a theoretic notation that takes constants into account - you are looking for tilde notation.

Also, that's why "big O notation" is seldom enough for large scale, or time critical systems, and these are constantly profiled to find bottlenecks which will be improved locally by the developers, so if you're looking for real performance - do it empirically, and don't settle for theoretic notations.

score 1 · Accepted Answer · answered Aug 21 '15 at 17:20

Well, the use of O(N) nomenclature abstracts away some important details that generally are only insignificant as N approaches infinity. Those details can and often are the most significant factors at values of N less than infinity. To help explain, consider that if a term is listed as O(N^x), it is only specifying the most significant factor of N. In reality, the performance could be characterized as:

aN^x + bN^(x-1) +cN^(x-2) + ... + K

So as N approaches infinity, the dominant term becomes N^x, but clearly at values of N that are less than infinity the dominant term could be one of the lesser terms. Looking at your examples, you give two algorithms. Let's call algorithm A the one that provides O(N) performance, and the one that provides O(logN) performance we'll call algorithm B. In reality, these two algorithms have performance characteristics as follows:

Performance A = aN + b(log N) + c

Performance B = x(log N) + y

If your constant values are a=0.001 and x=99,999, you can see how A provides better performance than B. In addition, you mention that one algorithm increases the likelihood of a cache miss, and that likelihood depends on the size of the data. You'll need to figure out the likelihood of the cache miss as a function of the data size and use that as a factor when calculating the O performance of the overall algorithm. For example:

If the cost of a cache miss is CM (we'll assume it's constant), then for algorithm A the overall cache performance is F(N)CM. If that cache performance is a factor in the dominant loop of algorithm A (the O(log N) part), then the real performance characteristic of algorithm A is O( F(N)(log N)). For algorithm B the overall cache performance would be F(log N)*CM. If the cache miss manifests during the dominant loop of algorithm B, then the real performance of algorithm B is O(F(log N)*N). As long as you can determine F(), you can then compare algorithm A and B.

also, how did you get `b(log N)` as part of A's performance: `Performance A = aN + b(log N) + c` ? — Glenn Teitelbaum, Aug 21 '15 at 18:03

How to account for cache misses in estimating performance?

2 Answers2