When defining a network in Caffe/Caffe2, can you place some of the nodes on the CPU and others on GPU? If so, how?
(If your answer pertains a specific version of Caffe, please specify which)
No, it's not possible. If you look at the solver.prototxt
file you'll notice that you may specify the mode as either CPU or GPU, but not both. The reason for keeping this kind of execution structure is to maintain efficiency. The data generated by each layer of a CNN maybe in megabytes. If you keep part of the network on the CPU and part of it on the GPU, you'll need to transfer huge chunks of data to and fro between the devices. This will add a huge overhead which will completely undo the leverage given by the GPU. So, it is more efficient to train the entire network on the CPU rather than the CPU-GPU combination. Also note that the GPU is connected with the CPU via a PCIe interface which is significantly slower than the internal CPU bus. So data transfer between the devices is really expensive. That's one of the reasons why larger batch sizes are preferred for training CNNs as a bunch of images can sent to the GPU at once, avoiding repetitive memory reads and writes.
This might be actually possible in Caffe2 but I have never tested it. In Caffe2, every blob and operator has a device assigned to it. An operator runs on the device assigned to it. But you would then need to manually take care of initialization and communication because data_parallel_model in Caffe2 is only equipped for multi-GPU setup.
Generally speaking, the answer is NO: you cannot configure the device for each layer independently for the reasons Pooya Davoodi and Harsh Wardhan described.
However, if you look at specific layers, you might sometimes get the behavior you look for. For instance, if your solver is configured to run on GPU, but you have a layer in your net that does not have a GPU implementation, then this layer will run on the CPU (with all the overhead described in Harsh Wardhan's answer).
One such layer is a "Python"
layer: This layer runs only on CPU and you may have your word2vec
implementation there.
Alternatively, you may write your own layers without GPU implementation making sure they run only on CPU.
BTW, are you using caffe2? Are you okay with their PATENTS clause?!
UPDATE: it seems like fb decided to soften caffe2's license. Well done!
Using DeviceScope with the relevant DeviceOption (CPU / GPU) device_type before creating the required Node and it's Blobs
simple example:
from caffe2.python import workspace, model_helper
from caffe2.proto import caffe2_pb2
from caffe2.python import core
import numpy as np
m = model_helper.ModelHelper(name="my first net")
data = np.random.rand(16, 100).astype(np.float32)
gpu_device_id = 1
cpu_device_id = -1
with core.DeviceScope(core.DeviceOption(workspace.GpuDeviceType, gpu_device_id)):
with core.DeviceScope(core.DeviceOption(caffe2_pb2.CPU, cpu_device_id)):
# Feed relevant blobs
workspace.FeedBlob("data", data)
weight = m.param_init_net.XavierFill([], 'fc_w', shape=[10, 100])
bias = m.param_init_net.ConstantFill([], 'fc_b', shape=[10, ])
# Create you cpu Node
fc_1 = m.net.FC(["data", "fc_w", "fc_b"], "fc1")
# Create GPU Node
pred = m.net.Sigmoid(fc_1, "pred")
softmax, loss = m.net.SoftmaxWithLoss([pred, "label"], ["softmax", "loss"])
print(m.net.Proto())
The output is:
name: "my first net"
op {
name: "my first net"
op {
input: "data"
input: "fc_w"
input: "fc_b"
output: "fc1"
name: ""
type: "FC"
device_option {
device_type: 0
device_id: -1
}
}
op {
input: "fc1"
output: "pred"
name: ""
type: "Sigmoid"
device_option {
device_type: 1
device_id: 1
}
}
op {
input: "pred"
input: "label"
output: "softmax"
output: "loss"
name: ""
type: "SoftmaxWithLoss"
device_option {
device_type: 1
device_id: 1
}
}
external_input: "data"
external_input: "fc_w"
external_input: "fc_b"
external_input: "label"