how does tensorflow assign Ops to run on GPU?

Question

I'm confused about the mechanism that tensorflow uses to assign different Ops to CPUs or GPUs.

Taking below pseudo code as an example. Can we say: as long as the SimpleOp is created within the context of with tf.device('/gpu:0'), it will surely run on GPU (suppose the GPU implementation of the SimpleOp is available), no matter its input variables (in_1 and in_2) are created on CPU or GPU?
```
with tf.device('/gpu:0'):
    out = tf.SimpleOp(in_1, in_2, name='Simple')
```
I understand by creating a session with log_device_placement=True, tensorflow outputs the device placements of all variables/Ops. However, is there a method allowing me to check only one Op's device assignment?

Thanks in advance!

Yaroslav Bulatov · Accepted Answer · 2017-01-07T20:19:22.300

TLDR; your op created in with tf.device("/gpu:0") will always run on GPU. If you specify input to be placed on cpu, then they will get placed on CPU. If you omit device specifications for inputs, they will get placed on GPU to be closer to your op. You can use run_metadata to get a Python object with all device assignments, and lookup your op there.

Placement is done by misleadingly named simple_placer.cc, and while the comments specify the mechanics, there are still some bugs getting hashed out (ie, here), so the best way is to check it in practice.

When you say that variables are created on GPU, there's actually two kinds of placement -- explicit, when you create the relevant op inside the with tf.device block, and implicit, outside of such block. Creating ops outside of with tf.device is equivalent to creating ops in with tf.device(None) block.

So here's a simple experiment

n = 10**6
def inputs_cpu():
    tf.reset_default_graph()
    with tf.device("/cpu:0"):
        a = tf.ones((n,), name="A")
        b = tf.ones((n,), name="B")
    with tf.device("/gpu:0"):
        c = tf.add(a, b, name="C")
    return c

def inputs_none():
    tf.reset_default_graph()
    a = tf.ones((n,), name="A")
    b = tf.ones((n,), name="B")
    with tf.device("/gpu:0"):
        c = tf.add(a, b, name="C")
    return c

def run_and_summarize(target):
    # turn off graph-rewriting optimizations
    sess = tf.Session(config=tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0))))
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()
    sess.run(target, options=run_options, run_metadata=run_metadata)

    for device in run_metadata.step_stats.dev_stats:
        device_name = device.device
        if not (device_name.endswith("/cpu:0") or device_name.endswith("/gpu:0")):
            continue
        print(device.device)
        for node in device.node_stats:
            print("   ", node.node_name)

Now you can do this

run_and_summarize(inputs_cpu())

That runs with inputs pinned to CPU and you'll see this placement is respected

/job:localhost/replica:0/task:0/gpu:0
    _SOURCE
    C
/job:localhost/replica:0/task:0/cpu:0
    _SOURCE
    A
    B

On other hand when inputs are not specified

run_and_summarize(inputs_none())

You can see that now all ops are placed on GPU

/job:localhost/replica:0/task:0/cpu:0
    _SOURCE
/job:localhost/replica:0/task:0/gpu:0
    _SOURCE
    A
    B
    C

BTW, recent [commit](https://github.com/tensorflow/tensorflow/commit/3f5f9585) clarified the behavior in comments. Basically if you set device explicitly, it'll always be on that device — Yaroslav Bulatov, Jan 09 '17 at 20:22
I was previously under the impression that TF would implicitly assign ops to GPUs whenever GPUs and suitable kernels for these ops were available. From this description I now understand that TF will only implicitly assign ops to GPUs if other ops that have an input relationship to these ops have been explicitly (e.g. `with tf.device("/gpu:0"):`) first. So if I don't assign any ops to GPUs explicitly, TF will not assign any implicitly either. Is that correct? — Drux, Dec 26 '17 at 14:10
@Drux I quote, "If a TensorFlow operation has both CPU and GPU implementations, by default, the GPU device is prioritized when the operation is assigned. For example, tf.matmul has both CPU and GPU kernels and on a system with devices CPU:0 and GPU:0, the GPU:0 device is selected to run tf.matmul unless you explicitly request to run it on another device." (https://www.tensorflow.org/guide/gpu#:~:text=If%20a%20TensorFlow%20operation%20has%20both%20CPU%20and%20GPU%20implementations%2C%20by%20default%2C%20the%20GPU%20device%20is%20prioritized%20when%20the%20operation%20is%20assigned.) — Jason, Jul 28 '22 at 11:35

score 0 · Answer 2 · answered Jan 07 '17 at 19:38

Yes. In fact, it will fail, if no kernel is available for specified device. But two things should be considered:
- This can be overridden with allow_soft_placement=True if session's config.
- tf.device context managers can be nested, so if SimpleOp wasn't so simple, it might have wrapped some of its parts with with tf.device("/cpu:0"):
Not that I know of (comments are welcome). You can always grep your script output, if you are using *nix, python script.py | grep your_op_name. The downside is that you'll need to rerun your script twice: first run with log_device_placement=True and grep, second run without them.

how does tensorflow assign Ops to run on GPU?

2 Answers2

Linked