No broadcasting for tf.matmul in TensorFlow

Question

I have a problem with which I've been struggling. It is related to tf.matmul() and its absence of broadcasting.

I am aware of a similar issue on https://github.com/tensorflow/tensorflow/issues/216, but tf.batch_matmul() doesn't look like a solution for my case.

I need to encode my input data as a 4D tensor: X = tf.placeholder(tf.float32, shape=(None, None, None, 100)) The first dimension is the size of a batch, the second the number of entries in the batch. You can imagine each entry as a composition of a number of objects (third dimension). Finally, each object is described by a vector of 100 float values.

Note that I used None for the second and third dimensions because the actual sizes may change in each batch. However, for simplicity, let's shape the tensor with actual numbers: X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))

These are the steps of my computation:

compute a function of each vector of 100 float values (e.g., linear function) W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1)) Y = tf.matmul(X, W) problem: no broadcasting for tf.matmul() and no success using tf.batch_matmul() expected shape of Y: (5, 10, 4, 50)
applying average pooling for each entry of the batch (over the objects of each entry): Y_avg = tf.reduce_mean(Y, 2) expected shape of Y_avg: (5, 10, 50)

I expected that tf.matmul() would have supported broadcasting. Then I found tf.batch_matmul(), but still it looks like doesn't apply to my case (e.g., W needs to have 3 dimensions at least, not clear why).

BTW, above I used a simple linear function (the weights of which are stored in W). But in my model I have a deep network instead. So, the more general problem I have is automatically computing a function for each slice of a tensor. This is why I expected that tf.matmul() would have had a broadcasting behavior (if so, maybe tf.batch_matmul() wouldn't even be necessary).

Look forward to learning from you! Alessio

lballes · Accepted Answer · 2016-06-29T12:01:45.330

9

You could achieve that by reshaping X to shape [n, d], where d is the dimensionality of one single "instance" of computation (100 in your example) and n is the number of those instances in your multi-dimensional object (5*10*4=200 in your example). After reshaping, you can use tf.matmul and then reshape back to the desired shape. The fact that the first three dimensions can vary makes that little tricky, but you can use tf.shape to determine the actual shapes during run time. Finally, you can perform the second step of your computation, which should be a simple tf.reduce_mean over the respective dimension. All in all, it would look like this:

X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
X_ = tf.reshape(X, [-1, 100])
Y_ = tf.matmul(X_, W)
X_shape = tf.gather(tf.shape(X), [0,1,2]) # Extract the first three dimensions
target_shape = tf.concat(0, [X_shape, [50]])
Y = tf.reshape(Y_, target_shape)
Y_avg = tf.reduce_mean(Y, 2)

edited Jun 29 '16 at 12:01

answered Jun 27 '16 at 14:21

lballes

1,502
12
19

Thanks for your answer. Unfortunately, your solution has two issues: 1. it averages over *all* the vectors, which is not correct 2. reshaping is valid only in the case of a fixed shape tensor, whereas I have batches in which the first 3 dimensions vary (fixed in each batch, different across batches) – Alessio B Jun 29 '16 at 07:13
Why does it average across all vectors? ``X[i, j, k, :]`` constitutes a single vector, right? By reshaping in the way I proposed, we are stacking these vectors in a large matrix (each row holding one of the vectors). If we now do the matrix multiplication, each row gets multiplied with the matrix separately. Now we can do with each row what's desired (e.g. taking the average, as in your example) and then re-arranging to the shape we want to have. I don't see where we are taking an average over the vectors, but I might be missing something. – lballes Jun 29 '16 at 07:24
Regarding the second issue, as long as the dimensionality of the vectors (100 in your example) is fixed, ``tf.reshape(X, [-1, 100])`` should work fine? Using the ``-1`` , there is no need to now the other dimensions a priori. – lballes Jun 29 '16 at 07:26
True, it doesn't average across all the vectors, but what you implemented is not what I need. What your code does is getting a scalar for each vector, whereas I need an average vector for each slice of the tensor. In fact, in my question I indicated that the expected shape of Y_avg has to be (5, 10, 50): the 3rd dimension disappears because we get an average vector of 50 elements. For the second issue, true that I can use -1 to reshape, but then I cannot go back to the original tensor form. My apologies if my question isn't 100% clear and again thanks a lot for your help! – Alessio B Jun 29 '16 at 10:00
PS: `Y_avg_` in your code has shape (200), hence you can't even reshape with `Y_avg = tf.reshape(Y_avg_, [5, 10, 50])` because `200 != 5 * 10 * 50` – Alessio B Jun 29 '16 at 10:02
Oh yes, I see. I totally misread the second part of your computation. Nevertheless, I think what you want to do can be achieved with ``tf.reshape``. I'll edit the answer! – lballes Jun 29 '16 at 11:59
Cool! That's it! I finally learned how to pass the target shape as a tensor. Thanks a lot Lukas :) – Alessio B Jun 30 '16 at 06:50
Ahhh! Thanks guys! Been struggling with this problem for a while :):):):):) – stianlp Feb 19 '17 at 22:57

score 2 · Answer 2 · answered Oct 17 '18 at 10:19

As the renamed title of the GitHub issue you linked suggests, you should use tf.tensordot(). It enables contraction of axes pairs between two tensors, in line with Numpy's tensordot(). For your case:

X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
Y = tf.tensordot(X, W, [[3], [0]])  # gives shape=[5, 10, 4, 50]

No broadcasting for tf.matmul in TensorFlow

2 Answers2

Linked