11

Assume we generate our own training data (by sampling from some diffusion process and computing some quantities of interest on it for example) and that we have our own CUDA routine called generate_data which generates labels in GPU memory for a given set of inputs.

Hence, we are in a special setting where we can generate as many batches of training data as we want in an "online" fashion (at each batch iteration we call that generate_data routine to generate a new batch and discard the old batch).

Since the data is generated on the GPU, is there a way to make TensorFlow (the Python API) directly use it during the training process ? (for example to fill a placeholder) That way, such a pipeline would be efficient.

My understanding is that currently you would need in such a setup to copy your data from GPU to CPU, and then let TensorFlow copy it again from CPU to GPU, which is rather wasteful as unnecessary copies are being performed.

EDIT: if it helps, we can assume that the CUDA routine is implemented using Numba's CUDA JIT compiler.

GZ0
  • 4,055
  • 1
  • 10
  • 21
BS.
  • 123
  • 10
  • Possible dup: https://stackoverflow.com/questions/42032331/can-two-process-shared-same-gpu-memory-cuda –  Jul 03 '19 at 08:50
  • 1
    This would be an interesting feature, as interacting with external GPU data would open a lot of possibilities, but I don't think there is anything remotely like that currently. TensorFlow uses CUDA through a number of layers and abstractions in C++. I'd say you would almost definitely need at least to [write a custom op](https://www.tensorflow.org/guide/extend/op) for this, and I'm not sure if it would be possible without further modifications to the library. – jdehesa Jul 03 '19 at 13:56

1 Answers1

2

This is definitely not a complete answer, but hopefully can help.

  • You can integrate your CUDA routine to TensorFlow by writing a custom op. There is currently no other way in TensorFlow to interact with other CUDA routines.

  • As for writing a training loop entirely on GPU, we can write the routine on GPU using tf.while_loop, in a very similar way to this SO question:

    i = tf.Variable(0, name='loop_i')
    
    def cond(i):
        return i < n
    
    def body(i):
        # Building the graph for custom routine and our model
        x, ground_truth = CustomCUDARountine(random_seed, ...)
        predictions = MyModel(x, ...)
    
        # Defining the optimizer
        loss = loss_func(ground_truth, predictions)
        optim = tf.train.GradientDescentOptimizer().minimize(loss)
    
        # loop body
        return tf.tuple([tf.add(i, 1)], control_inputs=[optim])
    
    loop = tf.while_loop(cond, body, [i])
    
    # Run the loop
    tf.get_default_session().run(loop)
    
Chan Kha Vu
  • 9,834
  • 6
  • 32
  • 64