Assume we don't have any GPUs
. We have a single machine which has one CPU
with N
cores. I want to train a neural network on the CPU and want to utilize the CPU as much as possible.
I know that I can set inter_op_parallelism_threads
and intra_op_parallelism_threads
variables to control over parallelism in Tensorflow. But I want to know how Tensorflow handles parallelism across CPU cores by default?
My initial guess is by default, Tensorflow sees all the CPU cores as one device and use them to run one single operation concurrently across all the cores using Eigen
library and we don't have any inter_op parallelism (which means running mutli ops in parallel).
I know about this question. But this is not what I'm looking for. I want to know how tensorflow itself handles parallelism across CPU cores.