40

I have a set of n vectors stored in the 3 x n matrix z. I find the outer product using np.einsum. When I timed it using:

%timeit v=np.einsum('i...,j...->ij...',z,z)

I got the result:

The slowest run took 7.23 times longer than the fastest. This could mean that an
intermediate result is being cached 
100000 loops, best of 3: 2.9 µs per loop

What is happening here and can it be avoided? The best 3 is 2.9us, but the slowest maybe more typical.

ali_m
  • 71,714
  • 23
  • 223
  • 298
user3799584
  • 917
  • 1
  • 9
  • 18
  • 4
    For testing purposes, try to increase the size `n`, this will reduce the fraction of `z` which is stored in your CPU's cache and the message should disappear at some point – Saullo G. P. Castro Apr 21 '15 at 10:26
  • 1
    Report says 100000 loops. Could the caching be from the first loop to the rest? If so, then we have to take the only the first loop's time. – Arvind Padmanabhan Oct 15 '18 at 06:33

1 Answers1

38

The message "intermediate result is being cached" is just a blind guess in the canned message reported by %timeit. It may or may not be true, and you should not assume it is correct.

In particular, one of the most common reasons for the first run being slowest is that the array is in the CPU cache only after the first run.

CPUs cache things automatically; you cannot avoid this, and you don't really want to avoid it. However, optimizing algorithms so that CPU caches can work optimally is nowadays one of the bottlenecks that high-performance computing needs to take into account.

pv.
  • 33,875
  • 8
  • 55
  • 49