11

I've been trying to run keras on a CPU cluster, and for this I need to limit the number of cores used (it's a shared system). So to limit the number of cores, I landed on this answer. However, this simply doesn't work. I tried running with this basic code:

from keras.applications.vgg16 import VGG16
from keras import backend as K
import numpy as np

conf = K.tf.ConfigProto(device_count={'CPU': 1}, 
                        intra_op_parallelism_threads=2, 
                        inter_op_parallelism_threads=2)
K.set_session(K.tf.Session(config=conf))
model = VGG16(weights='imagenet', include_top=False)
x = np.random.randn(1000, 224, 224, 3)
features = model.predict(x)

When I run this and check htop, it uses all (128) logical cores. Is this a bug in keras? Or am I doing something wrong?

Keras says that my CPU supports SSE4.1 and SSE4.2, which are not used because I didn't compile from binary. Will compiling from binary also fix the original question?

EDIT: I've found a workaround when launching the keras script from a unix machine:

taskset -c 0-23 python keras_script.py

This will run the script on the first 24 cores of the machine. It works, but it would still be nice if this was available from within keras/tensorflow.

wouterdobbels
  • 478
  • 1
  • 4
  • 12

1 Answers1

1

I found this snippet of code that works for me, hope it helps:

from keras import backend as K
import tensorflow as tf
jobs = 2 # it means number of cores
config = tf.ConfigProto(intra_op_parallelism_threads=jobs,
                         inter_op_parallelism_threads=jobs,
                         allow_soft_placement=True,
                         device_count={'CPU': jobs})
session = tf.Session(config=config)
K.set_session(session)
Pau
  • 536
  • 5
  • 16
  • After your comment, I tried my original code snippet again (which is very similar to yours), and it seems to work. Even with tensorflow and keras versions that date back to my question, I can't reproduce the bug in the original question (although I don't have the exact environment anymore). So... problem solved I guess. Thanks! – wouterdobbels Aug 08 '19 at 09:59
  • Can you please try it for jobs = 1 ? By doing so I was expecting one and only one thread but I got N threads (N = number of cores). I want to confine TensorFlow to a single thread. Can you please let me know your answer? – fisakhan Aug 24 '20 at 17:47
  • This doesn't work for me, with `tensorflow=1.15.0` using a `Ubuntu 20.04.5` computer. The output of OpenMP indicates that it would detect and use all my processers. (`OMP: Info #156: KMP_AFFINITY: 160 available OS procs OMP: Info #179: KMP_AFFINITY: 4 packages x 20 cores/pkg x 2 threads/core (80 total cores)`). As a result, it takes me about 2 or 3 hours to train a model. However, using `taskset -c 0-1`, it would just take 2 or 3 minutes. – user343233 Oct 26 '22 at 08:42
  • Does this work for case of using tf.data.Dataset made from generator which has GIL problem we should change parameter within fit method `use_multiprocessing=True`? in order to leverage all cores available? – haneulkim Aug 02 '23 at 06:00