11

I have a matrix

A = np.array([[0.2, 0.4, 0.6],
              [0.5, 0.5, 0.5],
              [0.6, 0.4, 0.2]])

I want a new matrix, where the value of the entry in row i and column j is the product of all the entries of the ith row of A, except for the cell of that row in the jth column.

array([[ 0.24,  0.12,  0.08],
       [ 0.25,  0.25,  0.25],
       [ 0.08,  0.12,  0.24]])

The solution that first occurred to me was

np.repeat(np.prod(A, 1, keepdims = True), 3, axis = 1) / A

But this only works so long as no entries have values zero.

Any thoughts? Thank you!

Edit: I have developed

B = np.zeros((3, 3))
for i in range(3):
    for j in range(3):
        B[i, j] = np.prod(i, A[[x for x in range(3) if x != j]])

but surely there is a more elegant way to accomplish this, which makes use of numpy's efficient C backend instead of inefficient python loops?

PersianGulf
  • 2,845
  • 6
  • 47
  • 67
everybody
  • 328
  • 1
  • 9

5 Answers5

4

If you're willing to tolerate a single loop:

B = np.empty_like(A)
for col in range(A.shape[1]):
    B[:,col] = np.prod(np.delete(A, col, 1), 1)

That computes what you need, a single column at a time. It is not as efficient as theoretically possible because np.delete() creates a copy; if you care a lot about memory allocation, use a mask instead:

B = np.empty_like(A)
mask = np.ones(A.shape[1], dtype=bool)
for col in range(A.shape[1]):
    mask[col] = False
    B[:,col] = np.prod(A[:,mask], 1)
    mask[col] = True
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • 1
    +1, Memory-wise, though, the two are equivalent. Indexing using a boolean mask creates a copy as well. There's no advantage to the second version. (Boolean indexing is a form of "fancy" indexing which always makes a copy. Unlike normal slicing, there's no way to describe the result in terms of an offset and strides.) – Joe Kington Mar 16 '14 at 20:13
  • @JoeKington: You're right, thank you. Do you know if there is a smarter way to subset the columns the way we need? We might like to avoid a copy, but we need to express the idea "all columns except one" for any arbitrary column. I don't see how that is possible with offsets and strides, but maybe you know some other way which could save memory here? Or maybe there is no way in NumPy today. – John Zwinck Mar 17 '14 at 03:42
3

A variation on your solution using repeat, uses [:,None].

np.prod(A,axis=1)[:,None]/A

My 1st stab at handling 0s is:

In [21]: B
array([[ 0.2,  0.4,  0.6],
       [ 0. ,  0.5,  0.5],
       [ 0.6,  0.4,  0.2]])

In [22]: np.prod(B,axis=1)[:,None]/(B+np.where(B==0,1,0))
array([[ 0.24,  0.12,  0.08],
       [ 0.  ,  0.  ,  0.  ],
       [ 0.08,  0.12,  0.24]])

But as the comment pointed out; the [0,1] cell should be 0.25.

This corrects that problem, but now has problems when there are multiple 0s in a row.

In [30]: I=B==0
In [31]: B1=B+np.where(I,1,0)
In [32]: B2=np.prod(B1,axis=1)[:,None]/B1
In [33]: B3=np.prod(B,axis=1)[:,None]/B1
In [34]: np.where(I,B2,B3)
Out[34]: 
array([[ 0.24,  0.12,  0.08],
       [ 0.25,  0.  ,  0.  ],
       [ 0.08,  0.12,  0.24]])

In [55]: C
array([[ 0.2,  0.4,  0.6],
       [ 0. ,  0.5,  0. ],
       [ 0.6,  0.4,  0.2]])
In [64]: np.where(I,sum1[:,None],sum[:,None])/C1
array([[ 0.24,  0.12,  0.08],
       [ 0.5 ,  0.  ,  0.5 ],
       [ 0.08,  0.12,  0.24]])

Blaz Bratanic's epsilon approach is the best non iterative solution (so far):

In [74]: np.prod(C+eps,axis=1)[:,None]/(C+eps)

A different solution iterating over the columns:

def paulj(A):
    P = np.ones_like(A)
    for i in range(1,A.shape[1]):
        P *= np.roll(A, i, axis=1)
    return P

In [130]: paulj(A)
array([[ 0.24,  0.12,  0.08],
       [ 0.25,  0.25,  0.25],
       [ 0.08,  0.12,  0.24]])
In [131]: paulj(B)
array([[ 0.24,  0.12,  0.08],
       [ 0.25,  0.  ,  0.  ],
       [ 0.08,  0.12,  0.24]])
In [132]: paulj(C)
array([[ 0.24,  0.12,  0.08],
       [ 0.  ,  0.  ,  0.  ],
       [ 0.08,  0.12,  0.24]])

I tried some timings on a large matrix

In [13]: A=np.random.randint(0,100,(1000,1000))*0.01

In [14]: timeit paulj(A)
1 loops, best of 3: 23.2 s per loop

In [15]: timeit blaz(A)
10 loops, best of 3: 80.7 ms per loop

In [16]: timeit zwinck1(A)
1 loops, best of 3: 15.3 s per loop

In [17]: timeit zwinck2(A)
1 loops, best of 3: 65.3 s per loop

The epsilon approximation is probably the best speed we can expect, but has some rounding issues. Having to iterate over many columns hurts the speed. I'm not sure why the np.prod(A[:,mask], 1) approach is slowest.

eeclo https://stackoverflow.com/a/22441825/901925 suggested using as_strided. Here's what I think he has in mind (adapted from an overlapping block question, https://stackoverflow.com/a/8070716/901925)

def strided(A):
    h,w = A.shape
    A2 = np.hstack([A,A])
    x,y = A2.strides
    strides = (y,x,y)
    shape = (w, h, w-1)
    blocks = np.lib.stride_tricks.as_strided(A2[:,1:], shape=shape, strides=strides)
    P = blocks.prod(2).T # faster to prod on last dim
    # alt: shape = (w-1, h, w), and P=blocks.prod(0)
    return P

Timing for the (1000,1000) array is quite an improvement over the column iterations, though still much slower than the epsilon approach.

In [153]: timeit strided(A)
1 loops, best of 3: 2.51 s per loop

Another indexing approach, while relatively straight forward, is slower, and produces memory errors sooner.

def foo(A):
    h,w = A.shape
    I = (np.arange(w)[:,None]+np.arange(1,w))
    I1 = np.array(I)%w
    P = A[:,I1].prod(2)
    return P
Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I don't think that it *does* work if there are zeroes. The 0th element of the first row should be 0.25, shouldn't it? – DSM Mar 16 '14 at 17:21
  • I've corrected that problem, though the solution is messier than what I'd like. – hpaulj Mar 16 '14 at 17:34
  • Unfortunately I don't think that will work either. Say there are two zeroes in a row: then every output row should be zero because every product contains a zero. But your new method will turn the zeroes into the product of the nonzero elements (because they'll take the case where the zeroes are replaced by 1.) This is much easier to see if you use a 4x4 matrix to test, which has a row with two nonzero elements and two zeroes. – DSM Mar 16 '14 at 17:36
  • [By "every output row" I of course meant "every element of the output row", not that the entire output array should be zero.] – DSM Mar 16 '14 at 17:44
  • hpaulj: didn't notice you already included a version of what I had in mind, but I included an edit to me post as well. your version isn't entirely correct; but regardless, the large speed difference somewhat surprises me. indeed the prod is taken over a small-stride axis, thus we should get good cache behavior, so the only additional cost is in concatting the array; yet the epsilon approach makes three passes over the array besides the actual prod as well... I don't quite get it. – Eelco Hoogendoorn Mar 17 '14 at 07:17
  • I get it now... not enough sleep. My method requires O(N3) operations, as opposed to O(N2) – Eelco Hoogendoorn Mar 17 '14 at 08:18
  • I posted another method, which is O(n^2). Curious to see how it compares. – Eelco Hoogendoorn Mar 17 '14 at 09:42
3

Im on the run, so I do not have time to work out this solution; but what id do is create a contiguous circular view over the last axis, by means of concatenating the array to itself along the last axis, and then use np.lib.index_tricks.as_strided to select the appropriate elements to take an np.prod over. No python loops, no numerical approximation.

edit: here you go:

import numpy as np

A = np.array([[0.2, 0.4, 0.6],
              [0.5, 0.5, 0.5],
              [0.5, 0.0, 0.5],
              [0.6, 0.4, 0.2]])

B = np.concatenate((A,A),axis=1)
C = np.lib.index_tricks.as_strided(
        B,
        A.shape  +A.shape[1:],
        B.strides+B.strides[1:])
D = np.prod(C[...,1:], axis=-1)

print D

Note: this method is not ideal, as it is O(n^3). See my other posted solution, which is O(n^2)

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
2

If you are willing to tolerate small error you could use the solution you first proposed.

A += 1e-10
np.around(np.repeat(np.prod(A, 1, keepdims = True), 3, axis = 1) / A, 9)
Blaz Bratanic
  • 2,279
  • 12
  • 17
1

Here is an O(n^2) method without python loops or numerical approximation:

def double_cumprod(A):
    B = np.empty((A.shape[0],A.shape[1]+1),A.dtype)
    B[:,0] = 1
    B[:,1:] = A
    L = np.cumprod(B, axis=1)
    B[:,1:] = A[:,::-1]
    R = np.cumprod(B, axis=1)[:,::-1]
    return L[:,:-1] * R[:,1:]

Note: it appears to be about twice as slow as the numerical approximation method, which is in line with expectation.

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
  • On my big matrix this tests a bit faster than Blaz's approximation. – hpaulj Mar 17 '14 at 16:18
  • I also tested it, but with a floating point 1000x1000 matrix. in that case the approximation pulls ahead on my machine, but for ints they are roughly equally matched in speed. – Eelco Hoogendoorn Mar 17 '14 at 17:19
  • This cumprod approach becomes more obvious when the problem is stated as `prod(A[j,:i-1])*prod(A[j,i+1:])`. That is, the product of 2 sequences.` As originally stated the problem focused on eliminating `A[j,i]` from a single `prod()`. – hpaulj Mar 17 '14 at 18:39