1

I have an (m, n) matrix where each row is an example with n features. I want to expand it to an (m, n, n) matrix, i.e. for each example create an outer product of its features. I've looked into tensordot but haven't figured out a way to do so - it seems to only contract the the tensors, not expand it.

a = np.arange(5*3).reshape(5, 3, 1)
b = np.arange(5*3).reshape(1, 3, 5)
c = np.tensordot(a, b, axes=([1,2],[1,0]))  # gives a (5,5) matrix
c = np.tensordot(a, b, axes=([1,2],[0,1]))  # throws a shape-mismatch error

I'll give a simple example for one row. Say you have the col vector a = [1, 2, 3] what I want to get is the a * a.T i.e.:

1, 2, 3
2, 4, 6
3, 6, 9
Maverick Meerkat
  • 5,737
  • 3
  • 47
  • 66
  • So that means that each value at `(i,j)` is put in `(i,j,k)` for all `k`s? – Willem Van Onsem Oct 17 '19 at 15:36
  • Can you provide some sample input and expected output? – Dan Oct 17 '19 at 15:37
  • no... it means that for each row you create the outer product of that row with itself, i.e. column x row. And you associate it with every example. – Maverick Meerkat Oct 17 '19 at 15:38
  • So for your inputs `a` and `b`, what is the desired output shape? – user3483203 Oct 17 '19 at 15:44
  • if a is (5,3,1) and b is (5,1,3) the desired shape is (5,3,3) – Maverick Meerkat Oct 17 '19 at 15:46
  • 2
    Try `np.einsum('ijk,ikl->ijl', a, b)`, but you should be able to get that result from the original matrix without creating anything intermediate. – user3483203 Oct 17 '19 at 15:47
  • Or maybe `np.einsum('ij,ik->ijk', a, a)` – Dan Oct 17 '19 at 15:48
  • @user3483203 the first one works - can you maybe add a full answer explaining how/why it works? – Maverick Meerkat Oct 17 '19 at 15:49
  • 1
    @DavidRefaeli if a is (5,3) instead of (5,3,1) then the second one should work. Which might save you the step of reshaping your matrix into a and b – Dan Oct 17 '19 at 15:51
  • @Dan yes, you are correct... all voodoo to me until someone explains why... :-) – Maverick Meerkat Oct 17 '19 at 15:52
  • it's voodoo to me to tbh, I found `np.einsum('ijk,ikl->ijl', a, b)` by googling how to broadcast matmul in numpy and then adjusted it to work for 2D. You can [read](https://en.wikipedia.org/wiki/Einstein_notation) this and try figure it out. Alternatively if you want more explicit code, maybe a for loop? – Dan Oct 17 '19 at 15:54
  • Possible duplicate of [Calculating the outer product for a sequence of numpy ndarrays](https://stackoverflow.com/questions/47624948/calculating-the-outer-product-for-a-sequence-of-numpy-ndarrays) – user3483203 Oct 17 '19 at 15:56
  • 1
    Also a dupe of https://stackoverflow.com/questions/42378936/numpy-elementwise-outer-product – user3483203 Oct 17 '19 at 15:56
  • Possible duplicate of [numpy elementwise outer product](https://stackoverflow.com/questions/42378936/numpy-elementwise-outer-product) – Dan Oct 17 '19 at 16:03

1 Answers1

2
In [220]: a = np.arange(15).reshape(5,3)                                        
In [221]: a                                                                     
Out[221]: 
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

Using standard numpy broadcasting:

In [222]: a[:,:,None]*a[:,None,:]                                               
Out[222]: 
array([[[  0,   0,   0],
        [  0,   1,   2],
        [  0,   2,   4]],

       [[  9,  12,  15],
        [ 12,  16,  20],
        [ 15,  20,  25]],

       [[ 36,  42,  48],
        [ 42,  49,  56],
        [ 48,  56,  64]],

       [[ 81,  90,  99],
        [ 90, 100, 110],
        [ 99, 110, 121]],

       [[144, 156, 168],
        [156, 169, 182],
        [168, 182, 196]]])
In [223]: _.shape                                                               
Out[223]: (5, 3, 3)

einsum was mentioned:

In [224]: np.einsum('ij,ik->ijk',a,a).shape                                     
Out[224]: (5, 3, 3)

The broadcasting works by:

(5,3) => (5,3,1) and (5,1,3) => (5,3,3)

Noneindexing is like your reshape(5,3,1), adding a dimension. In broadcasting size 1 dimensions match the corresponding dimension of the other array(s). reshape is cheap; use it freely.

tensordot is not well named; the 'tensor' means it can work with larger than 2d (but then all of numpy can do that). 'dot' refers to the dot product, the contraction. With einsum and matmul/@ tensordot isn't needed. And never worked for creating higher dimensional arrays.

hpaulj
  • 221,503
  • 14
  • 230
  • 353