8

I'm performing a decently complex operation on some 3- and 4-dimensional tensor using numpy einsum.

My actual code is

np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi)

This does what I want it to do.

Using einsum_path, the result is:

>>> path = np.einsum_path('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi)

>>> print(path[0])
['einsum_path', (0, 1), (0, 3), (0, 1), (0, 1)]

>>> print(path[1])
  Complete contraction:  oij,imj,mjkn,lnk,plk->op
         Naive scaling:  8
     Optimized scaling:  5
      Naive FLOP count:  2.668e+07
  Optimized FLOP count:  1.340e+05
   Theoretical speedup:  199.136
  Largest intermediate:  7.700e+02 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
   4                imj,oij->moj                     mjkn,lnk,plk,moj->op
   5               moj,mjkn->nok                          lnk,plk,nok->op
   4                plk,lnk->npk                              nok,npk->op
   4                 npk,nok->op                                   op->op

This indicates a theoretical speedup of about 200x.

How can I use this result to speed up my code? How do I "implement" what einsum_path is telling me?

Luca
  • 1,610
  • 1
  • 19
  • 30
  • 1
    You don't use the path directly. You choose the `einsum` parameter that lets it use that path, – hpaulj Feb 01 '19 at 15:52

2 Answers2

8

Do some time tests

path = np.einsum_path('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi)

np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi, optimize=False)
np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi, optimize=True)         
np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi, optimize=path[0])

In my testing the second 2 run at the same speed. For a small problem optimize=False is faster, presumably because the analysis and rearranging takes time. For a large problem, with a larger theoretical speedup, the actual speedup for True can be larger than than theory. Presumably memory management is slowing down the False case.

The theoretical speedup is just that, an estimate based just on FLOPS count. That will be true only to the extent that FLOPS dominate the calculation.

You could also time the path calc. The size of the problem will determine whether its time is a small or large part of the total time.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • 1
    The last is **exactly** what I was looking for. I now understand what einsum doc means when it says "Also accepts an explicit contraction list from the np.einsum_path function. See np.einsum_path for more details." The problem is that einsum_path documentation does not report this use of path[0]. I'll suggest to add this example to the doc – Luca Feb 02 '19 at 20:20
  • @LucaAmerio there is an example in the most recent numpy einsum docs regarding path calculation, where the [0] is added when it calculates the path and then uses it. Albeit I can see the confusion, the note exists already. – Attack68 Mar 08 '19 at 17:09
0

From the source code

Theoretical Speedup = speedup = naive_cost / opt_cost
naive_cost = _flop_count(idx_contract, idx_removed, len(input_sets), idx_dict)

So judging off of this, to speed up the process you would need to get your FLOPS (Floating Point Operations Per Second) lowered. Since the naive cost is the cost of the unoptimized expression, you need to rewrite your expression in such a way that you remove any "junk" associated with the expression while leaving without changing the underlying structure of the expression.

Judging off your question which states that you are doing some complex expressions, this may not be possible. But to answer your question, try rewriting your expression is a more compact way in order to get a lower theoretical speedup.

You could try using a different path which would lower your FLOPS.

Edeki Okoh
  • 1,786
  • 15
  • 27
  • So einsum_path does not tells you which is the "optimal" expression, it tells you only that it exists a faster way. Did I understand correctly? – Luca Feb 01 '19 at 15:48
  • Yes. If you look at naive_cost and opt_cost in the [source code](https://github.com/numpy/numpy/blob/v1.15.0/numpy/core/einsumfunc.py#L691-L979) you can see that naive cost uses input_sets (number of elements in the list) while opt_cost uses positions (The locations of the proposed tensors to contract). So it does not tell you which proposed tensors the algorithm used for the opt_cost, but it determines this. – Edeki Okoh Feb 01 '19 at 16:03
  • It would be very nice to have something to "copy paste" out of shelf into einsum to optimize it. At the moments the result seems to be saying "Nice try, but you can do better. Try again" – Luca Feb 01 '19 at 16:04
  • You can use the def _optimal_path function to see the optimal path. That is a good place to start. As hpaulj you could also choose the path that optimizes it which should work also. – Edeki Okoh Feb 01 '19 at 16:05
  • 1
    It is possible to pass the list (the first argument) returned by `...path` as the `optimize` parameter for a subsequent `einsum` call. But that shouldn't be necessary. It should be applying that optimization by default. Compare a default run against a `optimize=False` run. You should see a speed up. It probably won't be the full 200x, because there's more to these calculations than counting FLOPS. – hpaulj Feb 01 '19 at 17:31
  • I need to perform this operation thousands of times. So it would be great to optimise it once and then avoid the optimisation at every iteration. Could you share the code on how to pass the list returned by ...path to einsum? – Luca Feb 01 '19 at 19:44