Cachegrind simply simulates execution on a CPU, emulating how the cache and branch predictor might behave. To be able to know how long you would spend blocking on the cache would require a lot more information. Specifically you need to know when execution can be speculated and how many instructions can be dispatched in parallel (as well as how memory memory accesses can be coordinated simultaneously). Cachegrind can't do this, and any tool that could would depend heavily on the processor (whereas cache misses are much less processor dependent).
If you have access to a modern Intel CPU I'd recommend getting a free copy of VTune (for non-commercial purposes) and seeing what it says. It can tell the processor to collect data on cache misses and will report it back to you, so you can see what actually happened rather then just simulating. It will give you a clocks per instruction for each line of code, and using this you can see which lines are blocking on the cache (and how long for), it can also give you all the other information cachegrind can.
You can get it here:
http://software.intel.com/en-us/articles/non-commercial-software-download/