Any idea to improve cache performance for large scale program?

Question

I am working on a large scale software. It's driven by memory/data moving between huge amount of complicated models.

Sometime the cache misses is too high and the performance is not good. But the scenario looks too complicated to me.

I just want to get some general ideas on how to reduce the cache miss and improve memory performance.

Appreicate for any comment.

Thanks!

I don't think it is a bad question, but it is a broad question so the answers you will get are equally vague. — I GIVE CRAP ANSWERS, Nov 17 '10 at 15:33

score 1 · Answer 1 · edited May 23 '17 at 10:26

It may be having cache misses, but don't assume that's the problem.

Find out where the problems are and fix them, as in this example.

In my experience, the larger the software is, the larger are the problems (and the opportunities for speeding it up).

Often the software has been developed on fast machines, with small input data sets, so the developers never felt the need to profile and remove performance bugs. The good news is they're all in there, just waiting for you to find and remove them, getting massive speedup, for which you can take the credit!

score 1 · Accepted Answer · answered Nov 17 '10 at 15:43

The most valuable tool when hunting performance bottlenecks is measurement. You need to figure out what code has the problem and then measure it for cache misses, if that indeed proves to be the problem.

As for general ideas, you will need to lower the miss rate. So when you pull data into memory, you need to work as much as possible on it before you leave it again, rather than stream data. Compare as an example,

for i in data:
   f(i)

for i in data:
   g(i)

for i in data:
   h(i)

which traverses the list three times. It may be possible to write this as:

for i in data:
   h(g(f(i)))

lowering the traverse to only a single time - usually leading to fewer misses.

Another worthy trick is to think about the data structure. The access patterns of a binary tree are much different from those of a hash table. But establish measurement first so you can be sure you got the misses nailed - and that it is the misses that is your problem.

Finally, even with low miss rates, you can look into lowering memory bandwidth in general. If you move lots and lots of data, it tend to be slow - since memory speeds grow at a much lower rate compared to transistor count.

score 0 · Answer 3 · answered Nov 17 '10 at 15:21

0

This is a giant topic with no detail in the question. So, I'd suggest buying more RAM.

answered Nov 17 '10 at 15:21

Alex Miller

69,183
25
122
167

I'd say: buy more RAM and semplify the complicated models. – Simone Nov 17 '10 at 15:29

Any idea to improve cache performance for large scale program?

3 Answers3