0

Assume we don't have any GPUs. We have a single machine which has one CPU with N cores. I want to train a neural network on the CPU and want to utilize the CPU as much as possible.

I know that I can set inter_op_parallelism_threads and intra_op_parallelism_threadsvariables to control over parallelism in Tensorflow. But I want to know how Tensorflow handles parallelism across CPU cores by default?

My initial guess is by default, Tensorflow sees all the CPU cores as one device and use them to run one single operation concurrently across all the cores using Eigen library and we don't have any inter_op parallelism (which means running mutli ops in parallel).

I know about this question. But this is not what I'm looking for. I want to know how tensorflow itself handles parallelism across CPU cores.

Hamed
  • 315
  • 2
  • 13
  • 1
    if you want to optimize for CPU, you should probably use MKL version (the default conda version on Linnux I believe), it will use as many cores at it can by default – Yaroslav Bulatov Nov 07 '19 at 03:04
  • 2
    Thanks @YaroslavBulatov, does that mean the tensorflow does not use inter_op_parallelism_threads by default for a multi core cpu? – Hamed Nov 07 '19 at 15:26

1 Answers1

0

I have some problems with similar question. Posted my solution here. I use an intel processor, which have some CPU optimization parameters (openMP) for threading with keras processes,

import os
os.environ["OMP_NUM_THREADS"] = “16”