How to push the for-loop down to numpy

Question

I have the following piece of code doing exactly what i want (it is part of a kriging method). But the problem is that it goes too slow, and i wish to know if there is any option to push the for-loop down to numpy? If i push out the numpy.sum, and use the axis argument there, it speeds up a little bit, but apparently that is not the bottleneck. Any ideas on how i can push down the forloop to numpy to speed it up, or other ways to speed it up?)

# n = 2116
print GRZVV.shape  # (16309, 2116)
print GinvVV.shape  # (2117, 2117) 
VVg = numpy.empty((GRZVV.shape[0]))

for k in xrange(GRZVV.shape[0]):
    GRVV = numpy.empty((n+1, 1))
    GRVV[n, 0] = 1
    GRVV[:n, 0] = GRZVV[k, :]
    EVV = numpy.array(GinvVV * GRVV)  # GinvVV is numpy.matrix
    VVg[k] = numpy.sum(EVV[:n, 0] * VV)

I posted the dimensions of the ndarrays n matrix to clear some stuff out

edit: shape of VV is 2116

If `VV.shape == (16309,)`, how can you mulitply it by `EVV[:n, 0]` which has shape `(n,)`? — askewchan, Sep 23 '13 at 13:27
Maybe the last line of your loop should have `EVV[:n, 0] * VV[k]`, which seems to be what @Jaime's answer assumes. — askewchan, Sep 23 '13 at 13:35
@askewchan shape was 2116, was a bit confused there, i edited it in, — usethedeathstar, Sep 23 '13 at 13:38

Joel Vroom · Accepted Answer · 2013-09-23T14:17:16.883

5

You could do the following in place of your loop over k (runtime ~3s):

tmp = np.concatenate((GRZVV, np.ones((16309,1),dtype=np.double)), axis=1)
EVV1 = np.dot(GinvVV, tmp.T)
#Changed line below based on *askewchan's* recommendation
VVg1 = np.sum(np.multiply(EVV1[:n,:],VV[:,np.newaxis]), axis=0)

edited Sep 23 '13 at 14:17

answered Sep 23 '13 at 13:25

Joel Vroom

1,611
1
16
30

1

This gives the same result as @usethedeathstar's code, and runs 15x faster on my machine. – askewchan Sep 23 '13 at 13:55
1

No need for the tile call, as the `np.multiply` broadcasts. Change it to: `VVg1 = np.sum(np.multiply(EVV1[:n,:],VV[:,np.newaxis]), axis=0)` for a small speedup. – askewchan Sep 23 '13 at 14:08
2

+1 Much, much faster than `np.einsum` for large arrays, not sure I understand why... – Jaime Sep 23 '13 at 16:11
slight issue: VVg1 is still a numpy.matrix instead of ndarray? is this affecting the np.multiply or so? I assume/hope not, or should i after the np.dot do a cast to numpy.array on the EVV1? – usethedeathstar Sep 25 '13 at 08:18
Another side question: what is the difference between [:,numpy.newaxis] and using [:,None] ? And why the numpy.multiply instead of just using "*" ? And the speedup is 12 times (which is NICE) – usethedeathstar Sep 25 '13 at 09:20
Difference between np.newaxis and None --> http://stackoverflow.com/questions/944863/numpy-should-i-use-newaxis-or-none – Joel Vroom Sep 25 '13 at 12:49
`np.multiply` was used to do element-wise multiplication in the np.matrix (http://stackoverflow.com/questions/4151128/what-are-the-differences-between-numpy-arrays-and-matrices-which-one-should-i-u) – Joel Vroom Sep 25 '13 at 12:58
@joelVroom so basically i can cast it to an ndarray again and be done with it after the numpy.dot? – usethedeathstar Sep 26 '13 at 07:57

Jaime · Answer 2 · 2013-09-23T16:08:07.603

You are basically taking each row of GRZVV, appending a 1 at the end, multiplying it with GinvVV, then adding up all the elements in the vector. If you weren't doing the "append 1" thing, you could do it all with no loops as:

VVg = np.sum(np.dot(GinvVV[:, :-1], GRZVV.T), axis=-1) * VV

or even:

VVg = np.einsum('ij,kj->k', GinvVV[:, :-1], GRZVV) * VV

How do we handle that extra 1? Well, the resulting vector coming from the matrix multiplication would be incremented by the corresponding value in GinvVV[:, -1], and when you add them all, the value will be incremented by np.sum(GinvVV[:, -1]). So we can simply calculate this once and add it to all the items in the return vector:

VVg = (np.einsum('ij,kj->k', GinvVV[:-1, :-1], GRZVV) + np.sum(GinvVV[:-1, -1])) * VV

The above code works if VV is a scalar. If it is an array of shape (n,), then the following will work:

GinvVV = np.asarray(GinvVV)
VVgbis = (np.einsum('ij,kj->k', GinvVV[:-1, :-1]*VV[:, None], GRZVV) +
          np.dot(GinvVV[:-1, -1], VV))

With @usethedeathstar's edit, `VV.shape` is now 2116, so doesn't broadcast in your solution (since `Wg.shape` is 16309) — askewchan, Sep 23 '13 at 13:44
Figured it was a scalar. With a full array, not collapsing the array till the end, as in Joel's answer, is much faster than the above for large arrays. — Jaime, Sep 23 '13 at 16:12

How to push the for-loop down to numpy

2 Answers2