48

I have some data represented by input_x. It is a tensor of unknown size (should be inputted by batch) and each item there is of size n. input_x undergoes tf.nn.embedding_lookup, so that embed now has dimensions [?, n, m] where m is the embedding size and ? refers to the unknown batch size.

This is described here:

input_x = tf.placeholder(tf.int32, [None, n], name="input_x") 
embed = tf.nn.embedding_lookup(W, input_x)

I'm now trying to multiply each sample in my input data (which is now expanded by embedding dimension) by a matrix variable, U, and I can't seem to get how to do that.

I first tried using tf.matmul but it gives an error due to mismatch in shapes. I then tried the following, by expanding the dimension of U and applying batch_matmul (I also tried the function from tf.nn.math_ops., the result was the same):

U = tf.Variable( ... )    
U1 = tf.expand_dims(U,0)
h=tf.batch_matmul(embed, U1)

This passes the initial compilation, but then when actual data is applied, I get the following error:

In[0].dim(0) and In[1].dim(0) must be the same: [64,58,128] vs [1,128,128]

I also know why this is happening - I replicated the dimension of U and it is now 1, but the minibatch size, 64, doesn't fit.

How can I do that matrix multiplication on my tensor-matrix input correctly (for unknown batch size)?

JYun
  • 311
  • 2
  • 12
yoki
  • 1,796
  • 4
  • 16
  • 27
  • Just to add one thing. You will have to add initializer to scan function , to the size of the output of your two matrix multiplications, U*x – KARAN JAIN Dec 13 '16 at 09:31
  • Currently [tf.matmul](http://stackoverflow.com/a/43829731/1090562) is the right way to do batch multiplication. – Salvador Dali May 07 '17 at 09:07

5 Answers5

90

Previous answers are obsolete. Currently tf.matmul() support tensors with rank > 2:

The inputs must be matrices (or tensors of rank > 2, representing batches of matrices), with matching inner dimensions, possibly after transposition.

Also tf.batch_matmul() was removed and tf.matmul() is the right way to do batch multiplication. The main idea can be understood from the following code:

import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)

Now you will receive a tensor of the shape (batch_size, n, k). Here is what is going on here. Assume you have batch_size of matrices nxm and batch_size of matrices mxk. Now for each pair of them you calculate nxm X mxk which gives you an nxk matrix. You will have batch_size of them.

Notice that something like this is also valid:

A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)

and will give you a shape (a, b, n, k)

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
  • 7
    What is the correct way to do this if, like in the question, you want to multiply one matrix with many others? Do you have to replicate (tile) the single matrix batch_sizetimes or is there a better way? – KarlSt Jul 09 '17 at 08:19
  • 2
    @KarlSt Based on my experiments, this doesn't work when the first N-2 dimensions do not match. Clearly, the numpy version of this command supports broadcasting, but I think the only way to do it in TF is to tile the single matrix batch_size times. I have even tried playing transpose tricks (so it looks like the matrix is [batch_size, n, m] and second matrix is [1, m, k]), no luck. I'm not sure it can be called a bug, but clearly, this should be implemented in TF since its such a common operation. – sirgogo Jul 13 '17 at 01:00
  • I found a better way here: https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/4tgsOSxwtkY You can conflate the two dimensions not used in the multiplication using reshape, multiply the two matrices, and then call reshape again to get the desired shape. This is equivalent to doing batch multiplication. – KarlSt Jul 13 '17 at 11:46
  • 4
    This doesn't seem to answer the original question – Akababa Nov 28 '18 at 23:59
  • @Akababa look closer and you will see that it exactly answers it. I even provided the code which shows you how to do it. – Salvador Dali Nov 29 '18 at 00:13
  • 1
    @Salvador Dali you means that the we add another dimension to B? because we want to multiply A(batch size, n, m) with a weight matrix B (m, k). – keramat Jan 26 '19 at 15:54
38

1. I want to multiply a batch of matrices with a batch of matrices of the same length, pairwise

M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((batch_size, m, p))

# python >= 3.5
MN = M @ N
# or the old way,
MN = tf.matmul(M, N)
# MN has shape (batch_size, n, p)

2. I want to multiply a batch of matrices with a batch of vectors of the same length, pairwise

We fall back to case 1 by adding and removing a dimension to v.

M = tf.random_normal((batch_size, n, m))
v = tf.random_normal((batch_size, m))

Mv = (M @ v[..., None])[..., 0]
# Mv has shape (batch_size, n)

3. I want to multiply a single matrix with a batch of matrices

In this case, we cannot simply add a batch dimension of 1 to the single matrix, because tf.matmul does not broadcast in the batch dimension.

3.1. The single matrix is on the right side

In that case, we can treat the matrix batch as a single large matrix, using a simple reshape.

M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((m, p))

MN = tf.reshape(tf.reshape(M, [-1, m]) @ N, [-1, n, p])
# MN has shape (batch_size, n, p)

3.2. The single matrix is on the left side

This case is more complicated. We can fall back to case 3.1 by transposing the matrices.

MT = tf.matrix_transpose(M)
NT = tf.matrix_transpose(N)
NTMT = tf.reshape(tf.reshape(NT, [-1, m]) @ MT, [-1, p, n])
MN = tf.matrix_transpose(NTMT)

However, transposition can be a costly operation, and here it is done twice on an entire batch of matrices. It may be better to simply duplicate M to match the batch dimension:

MN = tf.tile(M[None], [batch_size, 1, 1]) @ N

Profiling will tell which option works better for a given problem/hardware combination.

4. I want to multiply a single matrix with a batch of vectors

This looks similar to case 3.2 since the single matrix is on the left, but it is actually simpler because transposing a vector is essentially a no-op. We end-up with

M = tf.random_normal((n, m))
v = tf.random_normal((batch_size, m))

MT = tf.matrix_transpose(M)
Mv = v @ MT

What about einsum?

All of the previous multiplications could have been written with the tf.einsum swiss army knife. For example the first solution for 3.2 could be written simply as

MN = tf.einsum('nm,bmp->bnp', M, N)

However, note that einsum is ultimately relying on tranpose and matmul for the computation.

So even though einsum is a very convenient way to write matrix multiplications, it hides the complexity of the operations underneath — for example it is not straightforward to guess how many times an einsum expression will transpose your data, and therefore how costly the operation will be. Also, it may hide the fact that there could be several alternatives for the same operation (see case 3.2) and might not necessarily choose the better option.

For this reason, I would personally use explicit formulas like those above to better convey their respective complexity. Although if you know what you are doing and like the simplicity of the einsum syntax, then by all means go for it.

P-Gn
  • 23,115
  • 9
  • 87
  • 104
17

The matmul operation only works on matrices (2D tensors). Here are two main approaches to do this, both assume that U is a 2D tensor.

  1. Slice embed into 2D tensors and multiply each of them with U individually. This is probably easiest to do using tf.scan() like this:

    h = tf.scan(lambda a, x: tf.matmul(x, U), embed)
    
  2. On the other hand if efficiency is important it may be better to reshape embed to be a 2D tensor so the multiplication can be done with a single matmul like this:

    embed = tf.reshape(embed, [-1, m])
    h = tf.matmul(embed, U)
    h = tf.reshape(h, [-1, n, c])
    

    where c is the number of columns in U. The last reshape will make sure that h is a 3D tensor where the 0th dimension corresponds to the batch just like the original x_input and embed.

Zhewriix
  • 148
  • 10
Styrke
  • 2,606
  • 1
  • 21
  • 17
  • Thank you! I do care about efficiency. How much should I avoid option 1, or does tensorflow (with GPU etc) do that efficiently more or less? About option 2, I lose some of the matrix structure this way, right? I'm surprised there's no support for this operation. Is it not a common operation? – yoki Jul 07 '16 at 21:24
  • @yoki Unless I made some mistake the results from the two approaches should be completely identical after the second reshape in option 2. I mainly included option 1 because it may be easier to understand how and why it works. I don't think what you're doing is very common outside of recurrent networks. (Which is one of the main uses of `scan`.) I noticed there's a [`batch_matmul`](https://www.tensorflow.org/versions/master/api_docs/python/math_ops.html#batch_matmul) operation that you could also use, but you would need to create a lot of copies of your `U` matrix to use that. – Styrke Jul 07 '16 at 21:51
  • 1
    @yoki Actually now that I am thinking about it, the thing you're trying to do probably doesn't really make a difference. Because matrix multiplication is associative, you would get the exact same result by multiplying `W` with `U` before you do the embedding lookup and then looking up the embeddings in that product. So unless you're doing something exotic that I don't know about, the most effective approach would be to simply define a single matrix that represents `WU` instead of actually defining both and then multiplying them together. – Styrke Jul 07 '16 at 22:12
  • 6
    This answer is [obsolete](https://stackoverflow.com/a/43829731/656912). – orome Jun 01 '17 at 11:53
  • 1
    Using `tf.matmul` supports batch matmul. For some 3D tensors, perform matmul considering only the axes that are not the batch axis. e.g. If `A` has size `(N, T, D)` and `B` has size `(N, D, V)`, disregarding batch axis 0 (indicated by N), our `tf.matmul(A, B)` gives shape `(N, T, V)` just like what matmul with tensors of shape `(T, D) and (D, V)` would give you a tensor of `(T, V)`. – kwotsin Aug 15 '18 at 08:03
  • Deprecated answer :) – antonioACR1 Sep 02 '18 at 03:49
4

As answered by @Stryke, there are two ways to achieve this: 1. Scanning, and 2. Reshaping

  1. tf.scan requires lambda functions and is generally used for recursive operations. Some examples for the same are here: https://rdipietro.github.io/tensorflow-scan-examples/

  2. I personally prefer reshaping, since it is more intuitive. If you are trying to matrix multiply each matrix in the 3D tensor by the matrix that is the 2D tensor, like Cijl = Aijk * Bkl, you can do it with a simple reshape.

    A' = tf.reshape(Aijk,[i*j,k])
    C' = tf.matmul(A',Bkl)
    C = tf.reshape(C',[i,j,l])
    
Desh Raj
  • 113
  • 1
  • 4
0

It seems that in TensorFlow 1.11.0 the docs for tf.matmul incorrectly say that it works for rank >= 2.

Instead, the best clean alternative I've found is to use tf.tensordot(a, b, (-1, 0)) (docs).

This function gets the dot product of any axis of array a and any axis of array b in its general form tf.tensordot(a, b, axis). Providing axis as (-1, 0) gets the standard dot product of two arrays.

James Fletcher
  • 920
  • 11
  • 13