How can I parallelize in auto-differentiation with tf.GradientTape?

Question

I would like to auto-differentiate across a rather complex function that I wish to parallelize.

I am using TensorFlow 2.x and using tf.GradientTape for differentiation.

I have made a toy example that illustrates the point. The auto-differentiation works just fine without threading but breaks when the exact same calculation is run in two separate threads.

import pdb
import tensorflow as tf
import threading

# This ThreadWithResult is from https://stackoverflow.com/a/65447493/1935801 and works fine on its own
class ThreadWithResult(threading.Thread):
    def __init__(self, group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None):
        def function():
            self.result = target(*args, **kwargs)
        super().__init__(group=group, target=function, name=name, daemon=daemon)

def my_function(x):
    return x*x + x*x*x

def my_function_threaded(x):
    def square(x):
        result = x*x
        return result

    def cube(x):
        result = x*x*x
        return result

    t1 = ThreadWithResult(target=square, args=(x,))
    t2 = ThreadWithResult(target=cube, args=(x,))

    t1.start()
    t2.start()

    t1.join()
    t2.join()

    y = t1.result + t2.result

    return y

x = tf.constant(3.0)
print("my_function(x) =", my_function(x))
print("my_function_threaded(x) =", my_function_threaded(x))

with tf.GradientTape() as tape:
    tape.watch(x)
    y = my_function(x)

dy_dx = tape.gradient(y, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
print("Simple dy_dx", dy_dx)

with tf.GradientTape() as tape:
    tape.watch(x)
    y = my_function_threaded(x)

dy_dx = tape.gradient(y, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
print("Threaded dy_dx", dy_dx)

As one can see in the output shown below, gradients are broken when threading is used for the same simple calculation.

my_function(x) = tf.Tensor(36.0, shape=(), dtype=float32)
my_function_threaded(x) = tf.Tensor(36.0, shape=(), dtype=float32)
Simple dy_dx tf.Tensor(33.0, shape=(), dtype=float32)
Threaded dy_dx tf.Tensor(0.0, shape=(), dtype=float32)

Any suggestions/ideas on how I could paralelize my function within GradientTape wouold be much appreciated?

If you're just adding the result of the two models, then you could compute the gradient of the two functions separately, then add the gradient together. If you have a more complicated way of ensembling the two models that's not shown by the toy example, then I don't know. — Nick ODell, Dec 07 '21 at 18:18
Thanks @Nick ODell. And, yes, could calculate the gradients per thread and then combine them if it were as simple as the toy example. Unfortunately, as you suggest, the toy example is an over simplification of the real thing. It is actually a network of interdependent models and while perhaps not impossible to combine after using GradientTape on each thread it would no longer be "auto". So, I was wondering if (and hoping that) there was some way of doing such a complex auto-differentiation in parallel without needing to program the book-keeping. — Morten Grum, Dec 07 '21 at 18:48
Totally understandable. I'll tell you what I was able to figure out while debugging. I tried running t1 and t2 one after the other, and that didn't help. So it's not a thread safety issue, or at least not *just* a thread safety issue. — Nick ODell, Dec 07 '21 at 19:06
@Laplace Ricky, my code is currently running on CPU but I expect that I will have it on GPU some time in the future. How might this impact my options? — Morten Grum, Dec 08 '21 at 03:08
On CPU, tensorflow will automatically parallelize operations in a tensorflow graph, you do not need to do the threading yourself. Instead, you need to use `tf.function` to create a tensorflow graph. On GPU, the situation is much more complicated and most of the time the operations will be performed sequentially if not supported. — Laplace Ricky, Dec 08 '21 at 12:03
That certainly changes the picture for me. I am familiar with @tf.function which I do use but was not aware that it was parallelizing in the background. This is good news for the short term as we will be on CPU. And then by the time future arrives we may be doing something totally different anyway. Thanks! — Morten Grum, Dec 08 '21 at 12:32

How can I parallelize in auto-differentiation with tf.GradientTape?

0 Answers0