Why is for loops so slow in tensorflow

Question

So I know this has something to do with when tensorflow builds the graph and it doesn't do it well... "efficiently". Here's the dummy code I'm running:

@tf.function
def parTest(x_in):
    res = 0
    for i in range(5000):
        res += x_in + i
    return res

running that function without using tensorflow takes 0.002 seconds, however running the function using tensorflow takes between 10 to 20 seconds. This makes no sense to me, what's going on here? Also, how do I fix it? The actual value of res here can obviously be calculated in a more efficient way, but the real problem I'm having is that I have a for loop where each iteration has lots of iterations which can be run independently of each other, but tensorflow refuses to do this and runs them really slow one by one, just like this dummy example. So how do I tell tensorflow not to do this?

score 2 · Answer 1 · answered Jun 18 '20 at 10:13

2

Loops are never very efficient in TensorFlow. However, this function is particularly bad for TensorFlow, because it will try to "unroll" the whole loop statically. That is, it will not "translate" your function into a tf.while_loop, but instead will literally create 5000 copies of the operations in each iteration. That is a very big graph, which on top of that will always run sequentially. I actually get a warning about this in TensorFlow 2.2.0, which points you to this information page: "WARNING: Large unrolled loop detected".

As mentioned in that link, the problem is that TensorFlow cannot (at least at the moment) detect loops over arbitrary iterators, not even if they are a simple range, so it just runs the loop in Python and creates the corresponding operations. You can avoid that either by writing the tf.while_loop yourself or, thanks to AutoGraph, simply by replacing your range with a tf.range:

import tensorflow as tf
@tf.function
def parTest(x_in):
    res = 0
    for i in tf.range(5000):
        res += x_in + i
    return res

Still, writing your own tf.while_loop (whenever absolutely necessary, as vectorized operations will always be faster) gives you more explicit control over details like the parallel_iterations parameter.

answered Jun 18 '20 at 10:13

jdehesa

58,456
7
77
121

Thank you for the detailed explanation :) The real issue I’m having is that I am creating a list in a for loop by appending the results from calling a tf.while_loop inside the for loop. I’d much rather make that list in parallel, since there’s no dependencies between the iterations in the for loop. The entries in the for loop added to the list have different shapes so I haven’t found any way pf vectorizing the operations :/ I will try the tf.range and see if it helps tho :) – Beacon of Wierd Jun 18 '20 at 10:35
@BeaconofWierd If you cannot get it to work feel free to edit the question with more information about your specific case (or post a new question if you think that's better). You might try [making a dataset out of your list of tensors](https://stackoverflow.com/q/47580716/1782792) and then iterating the dataset, which should convert the loop into dataset operations. – jdehesa Jun 18 '20 at 10:41
Making a dataset out of the tensors feels wrong, or rather that there should be some simpler solution. If I simply write our the for loop as a bunch of repeated code it works fine (but obviously doesn't work except since the actual loop size will vary), so the issue is really how to run/make a for loop in parallel in tensorflow. This is my original question: https://stackoverflow.com/questions/62169648/how-do-i-make-tensorflow-evaluate-tensors-in-parallel – Beacon of Wierd Jun 18 '20 at 10:51

Why is for loops so slow in tensorflow

1 Answers1