Force copy of tensor when enqueuing

Question

first, I'm not sure if the title is very good, but it was the best I could come up with given my understanding of the situation.

The background is that I'm trying to understand how queues work in tensorflow and ran into the following issue which puzzled me.

I have a variable n, which I enqueue to a tf.FIFOQueue, and then I increment the variable. This is repeated several times, and one would expect a result similar to 0, 1, 2, ... However, when emptying the queue all values are the same.

More precisely, the code is as follows:

from __future__ import print_function

import tensorflow as tf

q = tf.FIFOQueue(10, tf.float32)

n = tf.Variable(0, trainable=False, dtype=tf.float32)
inc = n.assign(n+1)
enqueue = q.enqueue(n)

init = tf.global_variables_initializer()

sess = tf.Session()
sess.run(init)

sess.run(enqueue)
sess.run(inc)

sess.run(enqueue)
sess.run(inc)

sess.run(enqueue)
sess.run(inc)

print(sess.run(q.dequeue()))
print(sess.run(q.dequeue()))
print(sess.run(q.dequeue()))

Which I expect would print:

0.0
1.0
2.0

Instead I get the following result:

3.0
3.0
3.0

It seems like I'm pushing some pointer to n to the queue, instead of the actual value, which is what I want. However, I don't really have any actual understanding of tensorflow internals, so maybe something else is going on?

I tried changing

enqueue = q.enqueue(n)

to

enqueue = q.enqueue(tf.identity(n))

since answers to How can I copy a variable in tensorflow and In TensorFlow, what is tf.identity used for? gives me the impression that it might help, but it does not change the result. I also tried adding a tf.control_dependencies(), but again, all values are the same when dequeueing.

Edit: The output above is from running the code on a computer with a single CPU, when trying to see if there was some difference between different versions of tensorflow, I noticed if I run the code on a computer with CPU and GPU I get the "expected" result. Indeed, if I run with CUDA_VISIBLE_DEVICES="" I get the result above, and with CUDA_VISIBLE_DEVICES="0" I get the "expected" result.

Another work-around `q.enqueue_many([[n]])` instead of `q.enqueue(n)`, that will do enqueue by value rather than by reference — Yaroslav Bulatov, Dec 24 '16 at 18:45

Yaroslav Bulatov · Accepted Answer · 2016-12-24T01:39:35.817

To force a non-caching read you can do

q.enqueue(tf.add(q, 0))

This is what's currently done by the batch-normalization layer to force a copy.

Semantics of how variables get read vs. referenced are in the process of getting revamped so they are temporarily non-intuitive. In particular, I expected q.enqueue(v.read_value()) to force a non-caching read, but it doesn't fix your example on TF 0.12rc1

Using GPU machine puts variable on GPU, while Queue is CPU only, so enqueue op forces a GPU->CPU copy.

user3176103 · Answer 2 · 2017-08-23T02:39:12.377

In case it helps, I've found that the other answers despite correct they do not work for all dtypes.

For example, this works fine with floats or ints but fails when n is a string tensor:

q.enqueue(tf.add(n, 0))

This one fails when the queue uses tuples with heterogeneous types (e.g., ints and floats):

q.enqueue_many([[n]])

So, if you see yourself caught in any of these situations try this instead:

q.enqueue(tf.add(n, tf.zeros_like(n)))

Or, to enqueue a tuple t:

q.enqueue([tf.add(n, tf.zeros_like(n)) for n in t])

That works even for string tensors and heterogeneous tuple types.

Hope it helps!

--

Update: it looks like tf.bool types do not work with tf.zeros_like(). For those, an explicit cast to an integer type might be needed.

Force copy of tensor when enqueuing

2 Answers2

Linked