Tensorflow does not detect multiple CPU cores with tf.distribute.MirroredStrategy()

Question

I want to distribute the training of my custom Keras model over the cores on my CPU (I do not have GPUs available). My CPU is an i7-7700, which has 4 cores. However, tensorflow only detects 1 core (EDIT: added full console output):

>>> import tensorflow as tf
2020-12-14 15:41:04.517355: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-12-14 15:41:04.517395: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
>>> strategy = tf.distribute.MirroredStrategy()
2020-12-14 15:41:23.483267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-14 15:41:23.514702: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-12-14 15:41:23.514745: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (razerblade): /proc/driver/nvidia/version does not exist
2020-12-14 15:41:23.514991: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-14 15:41:23.520064: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2799925000 Hz
2020-12-14 15:41:23.520407: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x42dc250 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-14 15:41:23.520461: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
>>> strategy.num_replicas_in_sync
1

How do I make tensorflow detect the 4 cores?

I am running Python 3.8.5, Tensorflow 2.3.1 on Ubuntu 20.04.

You don't need to use strategies in order for TensorFlow to use all cores CPU, it is automatic. CPU Cores are not each a single TensorFlow device, that is why it does not work like you expect to. — Dr. Snoopy, Dec 14 '20 at 16:09
So for strategies to use useful, I need to have multiple GPUs available? — lpdk, Dec 14 '20 at 16:41

Luis Bote · Answer 1 · 2020-12-15T11:19:06.700

On my PC look like same when i run this commands lines, but when i run tensorflow program its take all cores, i can see when tensorflow imports OMP running

output console:

    >>> import tensorflow as tf
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
>>> strategy = tf.distribute.MirroredStrategy()
2020-12-14 14:29:49.674255: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-14 14:29:49.726783: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2020-12-14 14:29:49.727051: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5647bde60ca0 executing computations on platform Host. Devices:
2020-12-14 14:29:49.727064: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-5
OMP: Info #214: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #156: KMP_AFFINITY: 6 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #285: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #285: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #285: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #285: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #285: KMP_AFFINITY: topology layer "thread" is equivalent to "core".
OMP: Info #191: KMP_AFFINITY: 1 socket x 6 cores/socket x 1 thread/core (6 total cores)
OMP: Info #216: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to socket 0 core 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to socket 0 core 2 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to socket 0 core 3 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to socket 0 core 4 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to socket 0 core 5 
OMP: Info #252: KMP_AFFINITY: pid 94119 tid 94119 thread 0 bound to OS proc set 0
2020-12-14 14:29:49.728234: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
INFO:tensorflow:Device is available but not used by distribute strategy: /device:XLA_CPU:0
WARNING:tensorflow:Not all devices in `tf.distribute.Strategy` are visible to TensorFlow.
>>> strategy.num_replicas_in_sync
1

So look your console output and paste there if you ve questions about it

I really dont know how strategy.num_replicas_in_sync works, but dont think its about cores running

Tensorflow version:1.14

edit:

I recommended you to use as you have it now, then see on Task Manager if its runnning more than one CPU, if its just running one, continuos whith that:

config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1, \
                    allow_soft_placement=True, device_count = {'CPU':        1})
session = tf.Session(config=config)
K.set_session(session)

See it on that link https://github.com/keras-team/keras/issues/4314

Change X for set num of threads intra_op_parallelism_threads='X', inter_op_parallelism_threads='X', i7 should be 8 threads https://www.bhphotovideo.com/c/product/1304296-REG/intel_bx80677i77700_core_i7_7700_4_2_ghz.html

I have now added the full console output, which differs from yours. I am not sure whether or not this tells me what the problem is? — lpdk, Dec 14 '20 at 14:47

score 0 · Answer 2 · answered Nov 03 '22 at 06:33

Causing you just use MirroredStrategy get one cpu core. if u want to use more cpu core, you need add lists in tf.distribute.MirroredStrategy, such as these:

import tensorflow as tf
strategy = tf.distribute.MirroredStrategy(["CPU:0", "CPU:1", "CPU:2", "CPU:3", "CPU:4"])
strategy.num_replicas_in_sync

The result is just like this:

Tensorflow does not detect multiple CPU cores with tf.distribute.MirroredStrategy()

2 Answers2

Linked