Lazy evaluations of numpy.einsum to avoid storing intermediate large dimensional arrays in memory

Question

Imagine that I have integers, n,q and vectors/arrays with these dimensions:

import numpy as np
n = 100
q = 102
A = np.random.normal(size=(n,n))
B = np.random.normal(size=(q, ))

C = np.einsum("i, jk -> ijk", B, A)
D = np.einsum('ijk, ikj -> k', C, C)

which is working fine if all intermediate arrays fit in memory.

Now assume that I can store in memory arrays of size (n,n), (q,n) but not any three dimensional arrays such as with shape (n,n,q). I cannot store in memory array C above. Instead, to compute D,

D1 = np.einsum('i, jk, i, kj -> k', B, A, B, A, optimize='optimal')

works fine and np.einsum is typically smart enough to find a einsum_path so that no 3d array is ever constructed. Great!

Now let's complicate things slightly:

C = np.einsum("i, jk -> ijk", B, A)    # as before

Y2 = np.random.normal(size=(n, ))
Z2 = np.random.normal(size=(q, n))
C2 = np.einsum("j, ik -> ijk", Y2, Z2)

E = np.einsum('ijk, ikj -> k', C+C2, C+C2)

Here I cannot find a reasonable way (reasonable, as in short/readable code) to construct E without constructing intermediate 3d arrays such as C and C2.

Questions:

is there a np.einsum one liner that would construct E, without constructing the intermediate 3d arrays C and C2?
The following appears to work by expanding into four terms, but is rather impractical compared to the hypothetical API in question 2...

E_CC   = np.einsum('i, jk, i, kj -> k', B,  A,  B,  A, optimize='optimal') # as D before
E_C2C2 = np.einsum('j, ik, k, ij -> k', Y2, Z2, Y2, Z2, optimize='optimal')
E_CC2  = np.einsum('i, jk, k, ij -> k', B,  A,  Y2, Z2, optimize='optimal')
E_C2C  = np.einsum('j, ik, i, kj -> k', Y2, Z2, B,  A, optimize='optimal')

E_new  = E_CC + E_C2C2 + E_CC2 + E_C2C 
np.isclose(E_new, E) # all True!

Is there a ''lazy'' version of np.einsum that would wait before the final call to find an optimal einsum_path throughout the composition of several lazy einsum, including sums as in the above example? For instance, with an hypothetical einsum_lazy, the following would construct E without storing a 3d array (such as C or C2) in memory:

C = np.einsum_lazy("i, jk -> ijk", B, A)  # nothing has been computed yet!
C2 = np.einsum_lazy("j, ik -> ijk", Y2, Z2) # nothing has been computed yet!
E = np.einsum('ijk, ikj -> k', C+C2, C+C2)  # expand the sums and uses optimal einsum_path to compute E

score 1 · Answer 1 · answered Feb 10 '21 at 17:02

This is a really fascinating question - as @s-m-e mentioned, numpy does not offer a lazy einsum computations, but it does offer a lower level function called np.einsum_path, which np.einsum uses to actually find the optimal contractions.

What if you did this:

C_path = np.einsum_path("i, jk -> ijk", B, A)[0]
C2_path = np.einsum_path("j, ik -> ijk", Y2, Z2)[0]
CC2_path = C_path + C2_path[1:]

And somehow used the path in a final computation? The biggest issue here is that you're summing C and C2, and elementwise addition is not currently supported by einsum, so it's hard to optimize that.

Take a look at @Eelco Hoogendoorn's response to a similar question: maybe breaking it up into smaller computations isn't such a bad idea :)

s-m-e · Answer 2 · 2020-10-31T19:45:54.170

Targeting question 2:

There is no lazy version of einsum, unfortunately. einsum simply returns a numpy ndarray object - which is precisely what a subsequent call to einsum would expect as a parameter in your scenario. However, you can exploit Python itself by using generators. In your case, the following would do the trick:

C1 = (np.einsum_lazy("i, jk -> ijk", b, a) for a, b in ((A, B),))
C2 = (np.einsum_lazy("j, ik -> ijk", y2, z2) for y2, z2 in ((Y2, Z2),))

def _einsum(v, w):
    u = v + w # no need to do this twice
    return np.einsum('ijk, ikj -> k', u, u)

E = (_einsum(c1, c2) for c1, c2 in ((C1, C2),))

for e in E: # only HERE C1, C2 and E are actually computed
    print(e)

The above example used chained generator expressions. It's the final for loop, which triggers the actual evaluation of the chain. It's lazy, more or less. There is also another downside: From a memory perspective, C1 and C2 are in fact constructed/created (temporarily).

If memory consumption is your primary concern and if you are doing multiple similar operations, you may have a look at the out parameter of einsum. In fact, most numpy ufuncs happen to have an out parameter, which allows you to specify a "preexisting" numpy ndarray as a target for the result of the operation. Therefore, no new memory needs to be allocated, which is also speeding up your computation as a side-effect.

Lazy evaluations of numpy.einsum to avoid storing intermediate large dimensional arrays in memory

2 Answers2