In TensorFlow, what is tf.identity used for?

Question

I've seen tf.identity used in a few places, such as the official CIFAR-10 tutorial and the batch-normalization implementation on stackoverflow, but I don't see why it's necessary.

What's it used for? Can anyone give a use case or two?

One proposed answer is that it can be used for transfer between the CPU and GPU. This is not clear to me. Extension to the question, based on this: loss = tower_loss(scope) is under the GPU block, which suggests to me that all operators defined in tower_loss are mapped to the GPU. Then, at the end of tower_loss, we see total_loss = tf.identity(total_loss) before it's returned. Why? What would be the flaw with not using tf.identity here?

a good usage [here](https://stackoverflow.com/questions/43839431/tensorflow-how-to-replace-or-modify-gradient) for tf.identity and gradient computations — brown.2179, Dec 04 '17 at 23:02

score 68 · Accepted Answer · edited Apr 25 '16 at 17:05

After some stumbling I think I've noticed a single use case that fits all the examples I've seen. If there are other use cases, please elaborate with an example.

Use case:

Suppose you'd like to run an operator every time a particular Variable is evaluated. For example, say you'd like to add one to x every time the variable y is evaluated. It might seem like this will work:

x = tf.Variable(0.0)
x_plus_1 = tf.assign_add(x, 1)

with tf.control_dependencies([x_plus_1]):
    y = x
init = tf.initialize_all_variables()

with tf.Session() as session:
    init.run()
    for i in xrange(5):
        print(y.eval())

It doesn't: it'll print 0, 0, 0, 0, 0. Instead, it seems that we need to add a new node to the graph within the control_dependencies block. So we use this trick:

x = tf.Variable(0.0)
x_plus_1 = tf.assign_add(x, 1)

with tf.control_dependencies([x_plus_1]):
    y = tf.identity(x)
init = tf.initialize_all_variables()

with tf.Session() as session:
    init.run()
    for i in xrange(5):
        print(y.eval())

This works: it prints 1, 2, 3, 4, 5.

If in the CIFAR-10 tutorial we dropped tf.identity, then loss_averages_op would never run.

In case this wasn't obvious to anyone else, the [docs](https://www.tensorflow.org/api_docs/python/tf/Graph#control_dependencies) explicitly state that "the control dependencies context applies only to ops that are constructed within the context. Merely using an op or tensor in the context does not add a control dependency." So we need `tf.identity` precisely to make an extra op within the control dependency context. — clwainwright, Aug 13 '18 at 15:15
y=x does not create a new op, it share the same memory object with x, so can't use within control_dependency. On the other hand, tf.identity do create a new op. and what's more, y = x + 0 also create a new op, so you can think of identity like y = x+0 — dingx, Aug 30 '21 at 03:09

Rafał Józefowicz · Answer 2 · 2016-01-19T18:04:00.953

tf.identity is useful when you want to explicitly transport tensor between devices (like, from GPU to a CPU). The op adds send/recv nodes to the graph, which make a copy when the devices of the input and the output are different.

A default behavior is that the send/recv nodes are added implicitly when the operation happens on a different device but you can imagine some situations (especially in a multi-threaded/distributed settings) when it might be useful to fetch the value of the variable multiple times within a single execution of the session.run. tf.identity allows for more control with regard to when the value should be read from the source device. Possibly a more appropriate name for this op would be read.

Also, please note that in the implementation of tf.Variable link, the identity op is added in the constructor, which makes sure that all the accesses to the variable copy the data from the source only once. Multiple copies can be expensive in cases when the variable lives on a GPU but it is read by multiple CPU ops (or the other way around). Users can change the behavior with multiple calls to tf.identity when desired.

EDIT: Updated answer after the question was edited.

In addition, tf.identity can be used used as a dummy node to update a reference to the tensor. This is useful with various control flow ops. In the CIFAR case we want to enforce that the ExponentialMovingAverageOp will update relevant variables before retrieving the value of the loss. This can be implemented as:

with tf.control_dependencies([loss_averages_op]):
  total_loss = tf.identity(total_loss)

Here, the tf.identity doesn't do anything useful aside of marking the total_loss tensor to be ran after evaluating loss_averages_op.

If possible, can you comment on my update above? Specifically, what would go wrong in https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py if we omitted the line `total_loss = tf.identity(total_loss)`? — rd11, Jan 19 '16 at 13:52
Hey good job including my answer in yours an hour after I wrote it — rd11, Feb 14 '21 at 14:34

score 17 · Answer 3 · answered Dec 24 '17 at 06:07

17

In addition to the above, I simply use it when I need to assign a name to ops that do not have a name argument, just like when initializing a state in RNN's:

rnn_cell = tf.contrib.rnn.MultiRNNCell([cells])
# no name arg
initial_state = rnn_cell.zero_state(batch_size,tf.float32)
# give it a name with tf.identity()
initial_state = tf.identity(input=initial_state,name="initial_state")

answered Dec 24 '17 at 06:07

ahmedhosny

1,099
14
25

1

looking for just this one – rajesh May 20 '18 at 14:33

score 5 · Answer 4 · answered Aug 25 '16 at 09:33

I came across another use case that is not completely covered by the other answers.

def conv_layer(input_tensor, kernel_shape, output_dim, layer_name, decay=None, act=tf.nn.relu):
    """Reusable code for making a simple convolutional layer.
    """
    # Adding a name scope ensures logical grouping of the layers in the graph.
    with tf.name_scope(layer_name):
        # This Variable will hold the state of the weights for the layer
        with tf.name_scope('weights'):
            weights = weight_variable(kernel_shape, decay)
            variable_summaries(weights, layer_name + '/weights')
        with tf.name_scope('biases'):
            biases = bias_variable([output_dim])
            variable_summaries(biases, layer_name + '/biases')
        with tf.name_scope('convolution'):
            preactivate = tf.nn.conv2d(input_tensor, weights, strides=[1, 1, 1, 1], padding='SAME')
            biased = tf.nn.bias_add(preactivate, biases)
            tf.histogram_summary(layer_name + '/pre_activations', biased)
        activations = act(biased, 'activation')
        tf.histogram_summary(layer_name + '/activations', activations)
        return activations

Most of the time when constructing a convolutional layer, you just want the activations returned so you can feed those into the next layer. Sometimes, however - for example when building an auto-encoder - you want the pre-activation values.

In this situation an elegant solution is to pass tf.identity as the activation function, effectively not activating the layer.

If the output is a list of tensors, may I use identity directly or I use first stack, and then identity? What is the best way. — Pablo Gonzalez, Oct 25 '17 at 03:19

score 5 · Answer 5 · answered Feb 08 '18 at 22:37

When our input data is serialized in bytes, and we want to extract features from this dataset. We can do so in key-value format and then get a placeholder for it. Its benefits are more realised when there are multiple features and each feature has to be read in different format.

  #read the entire file in this placeholder      
  serialized_tf_example = tf.placeholder(tf.string, name='tf_example')

  #Create a pattern in which data is to be extracted from input files
  feature_configs = {'image': tf.FixedLenFeature(shape=[256], dtype=tf.float32),/
                     'text': tf.FixedLenFeature(shape=[128], dtype=tf.string),/
                     'label': tf.FixedLenFeature(shape=[128], dtype=tf.string),}

  #parse the example in key: tensor dictionary
  tf_example = tf.parse_example(serialized_tf_example, feature_configs)

  #Create seperate placeholders operation and tensor for each feature
  image = tf.identity(tf_example['image'], name='image')
  text  = tf.identity(tf_example['text'], name='text')
  label = tf.identity(tf_example['text'], name='label')

This answer was just meant to show another use case of tf.identity(). — Shyam Swaroop, Feb 08 '18 at 22:39

score 4 · Answer 6 · answered Jun 23 '17 at 10:25

4

I found another application of tf.identity in Tensorboard. If you use tf.shuffle_batch, it returns multiple tensors at once, so you see messy picture when visualizing the graph, you can't split tensor creation pipeline from actiual input tensors: messy

But with tf.identity you can create duplicate nodes, which don't affect computation flow: nice

answered Jun 23 '17 at 10:25

grihabor

71
1
5

If the output is a list of tensors, may I use identity directly or I use first stack, and then identity? What is the best way. – Pablo Gonzalez Oct 25 '17 at 03:19
Let's suppose you have a graph which takes 2 parameters: image and label. You've implemented 2 ways to feed the value: with placeholders and with ```tf.shuffle_batch```. You want to visualize your graph nicely. Let's split the training subgraph and the batch generation subgraph. With placeholders it's easy. But batch is not: if you place operation tf.shuffle_batch into scope `batch_generation` and other operations into scope `model` then there is no INPUT node inside `model`. I suggest applying tf.identity directly to each vector generated by tf.shuffle_batch and move them to `model` scope – grihabor Oct 26 '17 at 23:26
Hi grihabor, would you mind sharing your code snippet? – user288609 Jan 26 '18 at 20:12

score 1 · Answer 7 · answered Dec 25 '17 at 14:29

In distribution training, we should use tf.identity or the workers will hang at waiting for initialization of the chief worker:

vec = tf.identity(tf.nn.embedding_lookup(embedding_tbl, id)) * mask
with tf.variable_scope("BiRNN", reuse=None):
    out, _ = tf.nn.bidirectional_dynamic_rnn(fw, bw, vec, sequence_length=id_sz, dtype=tf.float32)

For details, without identity, the chief worker would treat some variables as local variables inappropriately and the other workers wait for an initialization operation that can not end

score 0 · Answer 8 · answered Apr 22 '19 at 13:41

I see this kind of hack to check assert:

assertion = tf.assert_equal(tf.shape(image)[-1], 3, message="image must have 3 color channels")
with tf.control_dependencies([assertion]):
    image = tf.identity(image)

Also it's used just to give a name:

image = tf.identity(image, name='my_image')

In TensorFlow, what is tf.identity used for?

8 Answers8

Linked