1

I have a variable that contains the 4x4 identitiy matrix. I wish to assign some values to this matrix (these values are learned by the model).

When I use tf.assign() I get an error saying that strided slices do not have gradients. My question is how can I do this without using tf.assign()

Here is a sample code of the desired behaviour(without the error, since the values are not learned here) :

params = [[1.0, 2.0, 3.0]]
M = tf.Variable(tf.eye(4, batch_shape=[1]), dtype=tf.float32)
M = tf.assign(M[:, 0:3, 3], params)

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
output_val = sess.run(M)

Note - the variable is created solely for the purpose of housing these parameters.

UPDATE: I am adding a minimal working example that creates the error. (obviously training like this won't result in anything good. Its just to illustrate the error since my code is far too long to copy here)

params = [[1.0, 2.0, 3.0]]
M_gt = np.eye(4)
M_gt[0:3, 3] = [4.0, 5.0, 6.0]

M = tf.Variable(tf.eye(4, batch_shape=[1]), dtype=tf.float32)
M = tf.assign(M[:, 0:3, 3], params)

loss = tf.nn.l2_loss(M - M_gt)
optimizer = tf.train.AdamOptimizer(0.001)
train_op = optimizer.minimize(loss)
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
sess.run(train_op)
itzik Ben Shabat
  • 927
  • 11
  • 24
  • 1
    Possible duplicate of [How to do slice assignment in Tensorflow](https://stackoverflow.com/questions/39157723/how-to-do-slice-assignment-in-tensorflow) – jdehesa Mar 26 '18 at 09:10
  • @jdehesa, thanks. the question is similar but I didnt fully understand the answer. What does `with tf.control_dependencies([my_var[4:8].assign(tf.zeros(4))]): my_var = tf.identity(my_var)` do? I have multiple assignment lines in my final code, do I need to copy htis multiple times? – itzik Ben Shabat Mar 27 '18 at 08:28
  • Actually, looking at [the docs](https://www.tensorflow.org/api_docs/python/tf/Variable#__getitem__) again, I think it's fine if you just do `my_var = my_var[4:8].assign(tf.zeros(4))`; the return value of [`assign`](https://www.tensorflow.org/api_docs/python/tf/Variable#assign), even if it's applied to a slice, returns the value of the *whole* variable after the assignment has taken place (I'll fix/add a comment in the other answer)... – jdehesa Mar 27 '18 at 09:03
  • @jdehesa ok but this still doesn't solve my problem. After using your suggested way of assignment I still get the error `No gradient defined for operation strided slice...` (I also had to define each assignment as a variable again for some reason) – itzik Ben Shabat Mar 28 '18 at 07:13
  • Mmm, maybe you could show a minimal example where you get the error, to have a better idea of what you are trying to do? – jdehesa Mar 28 '18 at 08:29
  • @jdehesa I added a minimal example that creates the error. – itzik Ben Shabat Mar 28 '18 at 10:49
  • Ahh I see, so you want to replace a block in `M` with some values and use that later, right? And do you wally want to replace the stored value in `M`. Or, do you really need a variable at all, or just something that gets that block replaced? – jdehesa Mar 28 '18 at 11:33
  • @jdehesa I want to replace the values in M with some parameters that are learned by a DNN. This M will take part in the loss function (similar to the example) so the gradients should update these parameters. I am currently trying a workaround with some `tf.concat` (similar to the answer in the post you referenced) but I thought there might be a better solution. – itzik Ben Shabat Mar 28 '18 at 11:54
  • Right, I understand. Afaik, I think that you need to build the matrix by hand, like you suggest. I posted a possible answer, not sure if it's the best way but I think it works for your case (at least for the example). – jdehesa Mar 28 '18 at 12:10

1 Answers1

0

Here is an example of how you could do what (I think) you want:

import tensorflow as tf
import numpy as np

with tf.Graph().as_default(), tf.Session() as sess:
    params = [[1.0, 2.0, 3.0]]
    M_gt = np.eye(4)
    M_gt[0:3, 3] = [4.0, 5.0, 6.0]

    M = tf.Variable(tf.eye(4, batch_shape=[1]), dtype=tf.float32)
    params_t = tf.constant(params, dtype=tf.float32)

    shape_m = tf.shape(M)
    batch_size = shape_m[0]
    num_m = shape_m[1]
    num_params = tf.shape(params_t)[1]

    last_column = tf.concat([tf.tile(tf.transpose(params_t)[tf.newaxis], (batch_size, 1, 1)),
                             tf.zeros((batch_size, num_m - num_params, 1), dtype=params_t.dtype)], axis=1)
    replace = tf.concat([tf.zeros((batch_size, num_m, num_m - 1), dtype=params_t.dtype), last_column], axis=2)

    r = tf.range(num_m)
    ii = r[tf.newaxis, :, tf.newaxis]
    jj = r[tf.newaxis, tf.newaxis, :]
    mask = tf.tile((ii < num_params) & (tf.equal(jj, num_m - 1)), (batch_size, 1, 1))
    M_replaced = tf.where(mask, replace, M)

    loss = tf.nn.l2_loss(M_replaced - M_gt[np.newaxis])
    optimizer = tf.train.AdamOptimizer(0.001)
    train_op = optimizer.minimize(loss)
    sess = tf.Session()
    init = tf.global_variables_initializer()
    sess.run(init)
    M_val, M_replaced_val = sess.run([M, M_replaced])
    print('M:')
    print(M_val)
    print('M_replaced:')
    print(M_replaced_val)

Output:

M:
[[[ 1.  0.  0.  0.]
  [ 0.  1.  0.  0.]
  [ 0.  0.  1.  0.]
  [ 0.  0.  0.  1.]]]
M_replaced:
[[[ 1.  0.  0.  1.]
  [ 0.  1.  0.  2.]
  [ 0.  0.  1.  3.]
  [ 0.  0.  0.  1.]]]
jdehesa
  • 58,456
  • 7
  • 77
  • 121