7

This is a pretty simple question that I just can't seem to figure out. I am working with an an output tensor of shape [100, 250]. I want to be able to access the 250 Dimensional array at any spot along the hundred and modify them separately. The tensorflow mathematical tools that I've found either do element-wise modification or scalar modification on the entire tensor. However, I am trying to do scalar modification on subsets of the tensor.

EDIT:

Here is the numpy code that I would like to recreate with tensorflow methods:

update = sess.run(y, feed_dict={x: batch_xs})
for i in range(len(update)):
        update[i] = update[i]/np.sqrt(np.sum(np.square(update[i])))
        update[i] = update[i] * magnitude

This for loop follows this formula in 250-D instead of 3-D Unit vector formula, which is the first line of the for-loop . I then multiply each unit vector by magnitude to re-shape it to my desired length.

So update here is the numpy [100, 250] dimensional output. I want to transform each 250 dimensional vector into its unit vector. That way I can change its length to a magnitude of my choosing. Using this numpy code, if I run my train_step and pass update into one of my placeholders

sess.run(train_step, feed_dict={x: batch_xs, prediction: output}) 

it returns the error:

No gradients provided for any variable

This is because I've done the math in numpy and ported it back into tensorflow. Here is a related stackoverflow question that did not get answered.

the tf.nn.l2_normalize is very close to what I am looking for, but it divides by the square root of the maximum sum of squares. Whereas I am trying to divide each vector by its own sum of squares.

Thanks!

Community
  • 1
  • 1
Andrew Draganov
  • 676
  • 6
  • 18
  • 1
    So, a different way to phrase this questions is this: if I have an [x, y] shaped tensor and a [x, 1] shaped tensor, can I do mathematical operations such that the value in the '1' category affects every respective value in the 'y' category? Not element wise, but not specifically scalar multiplication either. – Andrew Draganov Jun 27 '16 at 18:54
  • I find your question very unclear. Could you give the code for how you would do it in numpy? – Olivier Moindrot Jun 27 '16 at 20:11
  • maybe using the map funciont of tensorflow? It is available in 0.9 – jorgemf Jun 27 '16 at 20:39
  • Does tf.nn.l2_normalize do what you want? – Peter Hawkins Jun 27 '16 at 23:09
  • Hi, sorry for the late reply. I have edited my post to hopefully explain everything more clearly. – Andrew Draganov Jun 28 '16 at 20:14
  • Why are you doing a `for` loop? `numpy` is perfectly capable of doing the whole operation in basically one line: `update /= np.sqrt(np.sum(np.square(update), axis=1)); update *= magnitude`. The thing you are trying to do in your initial comment seems to be related to broadcasting. – Mad Physicist Jun 28 '16 at 20:34
  • The key is passing `axis=1` to `np.sum`, which reduces the (100, 250) array of squares to a (100,) array. – Mad Physicist Jun 28 '16 at 20:35
  • Right, that is true. Regardless of how I do it in numpy, however, I can't figure out how to transfer this into TensorFlow. If I come out and re-arrange my values in NumPy, then Tensorflow loses track of them and can't compute gradients. Your response is a better way to do it in numpy, though, thank you. – Andrew Draganov Jun 28 '16 at 20:38
  • Missed a step: `update /= np.expand_dims(np.sqrt(np.sum(np.square(update), axis=1)), axis=1);`. Need `expand_dims` for broadcasting to work. – Mad Physicist Jun 28 '16 at 20:40
  • Given that `tf` vectors appear to be extensions of `np` arrays, I am pretty sure you can just call `tf.*` instead of `np.*` for the example I gave you to preserve the metadata. Something like `update /= tf.expand_dims(tf.sqrt(tf.sum(tf.square(update), axis=1)), axis=1);` should work. – Mad Physicist Jun 28 '16 at 20:55

2 Answers2

8

There is no real trick here, you can do as in numpy.
The only thing to make sure is that norm is of shape [100, 1] so that it broadcasts well in the division x / norm.

x = tf.ones([100, 250])

norm = tf.sqrt(tf.reduce_sum(tf.square(x), axis=1, keepdims=True))
assert norm.shape == [100, 1]

res = x / norm
Olivier Moindrot
  • 27,908
  • 11
  • 92
  • 91
  • Holy Crap! I've always thought `keepdims` wasn't useful. Finally an use case that makes sense. Thanks! – gokul_uf Sep 06 '18 at 16:57
0

You can user tf.norm to get the square root of the sum of squares. (tf version == 1.4 in my code.)

Example code:

  import  tensorflow as tf
  a = tf.random_uniform((3, 4))
  b = tf.norm(a, keep_dims=True)
  c = tf.norm(a, axis=1, keep_dims=True)
  d = a / c
  e = a / tf.sqrt(tf.reduce_sum(tf.square(a), axis=1, keep_dims=True) + 1e-8)
  f = a / tf.sqrt(tf.reduce_sum(tf.square(a), axis=1, keep_dims=True))
  g = tf.sqrt(tf.reduce_sum(tf.square(a), axis=1, keep_dims=True))
  with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    a_eval, b_eval, c_eval, d_eval, e_eval, f_eval, g_eval = sess.run([a, b, c, d, e, f, g])
    print(a_eval)
    print(b_eval)
    print(c_eval)
    print(d_eval)
    print(e_eval)
    print(f_eval)
    print(g_eval)

output:

[[ 0.29823065  0.76523042  0.40478575  0.44568062]
 [ 0.0222317   0.12344956  0.39582515  0.66143286]
 [ 0.01351094  0.38285756  0.46898723  0.34417391]]
[[ 1.4601624]]
[[ 1.01833284]
 [ 0.78096414]
 [ 0.6965394 ]]
[[ 0.29286167  0.75145411  0.39749849  0.43765712]
 [ 0.02846699  0.15807328  0.50684166  0.84694397]
 [ 0.01939724  0.54965669  0.6733104   0.49411979]]
[[ 0.29286167  0.75145411  0.39749849  0.43765712]
 [ 0.02846699  0.15807328  0.50684166  0.84694397]
 [ 0.01939724  0.54965669  0.6733104   0.49411979]]
[[ 0.29286167  0.75145411  0.39749849  0.43765712]
 [ 0.02846699  0.15807328  0.50684166  0.84694397]
 [ 0.01939724  0.54965669  0.6733104   0.49411979]]
[[ 1.01833284]
 [ 0.78096414]
 [ 0.6965394 ]]

You can see that there's no difference between a / tf.norm(a, axis=1, keep_dims=True) and a / tf.sqrt(tf.reduce_sum(tf.square(a), axis=1, keep_dims=True) + 1e-8).

a / tf.sqrt(tf.reduce_sum(tf.square(a), axis=1, keep_dims=True) + 1e-8) is preferred because it can handle zero case.

user5746429
  • 135
  • 2
  • 9