TLDR; your op created in with tf.device("/gpu:0")
will always run on GPU. If you specify input to be placed on cpu
, then they will get placed on CPU. If you omit device specifications for inputs, they will get placed on GPU to be closer to your op. You can use run_metadata
to get a Python object with all device assignments, and lookup your op there.
Placement is done by misleadingly named simple_placer.cc, and while the comments specify the mechanics, there are still some bugs getting hashed out (ie, here), so the best way is to check it in practice.
When you say that variables are created on GPU, there's actually two kinds of placement -- explicit, when you create the relevant op inside the with tf.device
block, and implicit, outside of such block. Creating ops outside of with tf.device
is equivalent to creating ops in with tf.device(None)
block.
So here's a simple experiment
n = 10**6
def inputs_cpu():
tf.reset_default_graph()
with tf.device("/cpu:0"):
a = tf.ones((n,), name="A")
b = tf.ones((n,), name="B")
with tf.device("/gpu:0"):
c = tf.add(a, b, name="C")
return c
def inputs_none():
tf.reset_default_graph()
a = tf.ones((n,), name="A")
b = tf.ones((n,), name="B")
with tf.device("/gpu:0"):
c = tf.add(a, b, name="C")
return c
def run_and_summarize(target):
# turn off graph-rewriting optimizations
sess = tf.Session(config=tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0))))
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess.run(target, options=run_options, run_metadata=run_metadata)
for device in run_metadata.step_stats.dev_stats:
device_name = device.device
if not (device_name.endswith("/cpu:0") or device_name.endswith("/gpu:0")):
continue
print(device.device)
for node in device.node_stats:
print(" ", node.node_name)
Now you can do this
run_and_summarize(inputs_cpu())
That runs with inputs pinned to CPU and you'll see this placement is respected
/job:localhost/replica:0/task:0/gpu:0
_SOURCE
C
/job:localhost/replica:0/task:0/cpu:0
_SOURCE
A
B
On other hand when inputs are not specified
run_and_summarize(inputs_none())
You can see that now all ops are placed on GPU
/job:localhost/replica:0/task:0/cpu:0
_SOURCE
/job:localhost/replica:0/task:0/gpu:0
_SOURCE
A
B
C