3

I want to distribute the training of my custom Keras model over the cores on my CPU (I do not have GPUs available). My CPU is an i7-7700, which has 4 cores. However, tensorflow only detects 1 core (EDIT: added full console output):

>>> import tensorflow as tf
2020-12-14 15:41:04.517355: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-12-14 15:41:04.517395: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
>>> strategy = tf.distribute.MirroredStrategy()
2020-12-14 15:41:23.483267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-12-14 15:41:23.514702: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-12-14 15:41:23.514745: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (razerblade): /proc/driver/nvidia/version does not exist
2020-12-14 15:41:23.514991: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-14 15:41:23.520064: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2799925000 Hz
2020-12-14 15:41:23.520407: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x42dc250 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-14 15:41:23.520461: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
>>> strategy.num_replicas_in_sync
1

How do I make tensorflow detect the 4 cores?

I am running Python 3.8.5, Tensorflow 2.3.1 on Ubuntu 20.04.

lpdk
  • 33
  • 5
  • You don't need to use strategies in order for TensorFlow to use all cores CPU, it is automatic. CPU Cores are not each a single TensorFlow device, that is why it does not work like you expect to. – Dr. Snoopy Dec 14 '20 at 16:09
  • So for strategies to use useful, I need to have multiple GPUs available? – lpdk Dec 14 '20 at 16:41

2 Answers2

0

On my PC look like same when i run this commands lines, but when i run tensorflow program its take all cores, i can see when tensorflow imports OMP running

output console:

    >>> import tensorflow as tf
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ecapture/anaconda3/envs/cpu/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
>>> strategy = tf.distribute.MirroredStrategy()
2020-12-14 14:29:49.674255: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-14 14:29:49.726783: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2020-12-14 14:29:49.727051: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5647bde60ca0 executing computations on platform Host. Devices:
2020-12-14 14:29:49.727064: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-5
OMP: Info #214: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #156: KMP_AFFINITY: 6 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #285: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #285: KMP_AFFINITY: topology layer "L3 cache" is equivalent to "socket".
OMP: Info #285: KMP_AFFINITY: topology layer "L2 cache" is equivalent to "core".
OMP: Info #285: KMP_AFFINITY: topology layer "L1 cache" is equivalent to "core".
OMP: Info #285: KMP_AFFINITY: topology layer "thread" is equivalent to "core".
OMP: Info #191: KMP_AFFINITY: 1 socket x 6 cores/socket x 1 thread/core (6 total cores)
OMP: Info #216: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to socket 0 core 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to socket 0 core 2 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to socket 0 core 3 
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to socket 0 core 4 
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to socket 0 core 5 
OMP: Info #252: KMP_AFFINITY: pid 94119 tid 94119 thread 0 bound to OS proc set 0
2020-12-14 14:29:49.728234: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
INFO:tensorflow:Device is available but not used by distribute strategy: /device:XLA_CPU:0
WARNING:tensorflow:Not all devices in `tf.distribute.Strategy` are visible to TensorFlow.
>>> strategy.num_replicas_in_sync
1

So look your console output and paste there if you ve questions about it

I really dont know how strategy.num_replicas_in_sync works, but dont think its about cores running

Tensorflow version:1.14

edit:

I recommended you to use as you have it now, then see on Task Manager if its runnning more than one CPU, if its just running one, continuos whith that:

config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1, \
                    allow_soft_placement=True, device_count = {'CPU':        1})
session = tf.Session(config=config)
K.set_session(session)

See it on that link https://github.com/keras-team/keras/issues/4314

Change X for set num of threads intra_op_parallelism_threads='X', inter_op_parallelism_threads='X', i7 should be 8 threads https://www.bhphotovideo.com/c/product/1304296-REG/intel_bx80677i77700_core_i7_7700_4_2_ghz.html

Luis Bote
  • 169
  • 2
  • 13
  • I have now added the full console output, which differs from yours. I am not sure whether or not this tells me what the problem is? – lpdk Dec 14 '20 at 14:47
0

Causing you just use MirroredStrategy get one cpu core. if u want to use more cpu core, you need add lists in tf.distribute.MirroredStrategy, such as these:

import tensorflow as tf
strategy = tf.distribute.MirroredStrategy(["CPU:0", "CPU:1", "CPU:2", "CPU:3", "CPU:4"])
strategy.num_replicas_in_sync

The result is just like this:

enter image description here

Marc Chan
  • 11
  • 4