Limit number of cores used on server for tensorflow 2 and keras

Question

I try to run a Python script that trains several Neural Networks using TensorFlow and Keras. The problem is that I cannot restrict the number of cores used on the server, even though it works on my local desktop.

The basic structure is that I have defined a function run_net that runs the neural net. This function is called with different parameters in parallel using joblib (see below). Additionally, I have tried running the function iteratively with different parameters which didn't solve the problem.

Parallel(n_jobs=1, backend="multiprocessing")(
            delayed(run_net)

If I run that on my local Windows Desktop, everything works fine. However, if I try to run the same script on our institute's server with 48 cores and check CPU usage using htop command, all cores are used. I already tried setting n_jobs in joblib Parallel to 1 and it looks like CPU usage goes to 100% once the tensorflow models are trained.

I already searched for different solutions and the main one that I found is the one below. I define that before running the parallel jobs shown above. I also tried placing the code below before every fit or predict method of the model.

NUM_PARALLEL_EXEC_UNITS = 5
config = tf.compat.v1.ConfigProto(
    intra_op_parallelism_threads=NUM_PARALLEL_EXEC_UNITS,
    inter_op_parallelism_threads=2,
    device_count={"CPU": NUM_PARALLEL_EXEC_UNITS},
)
session = tf.compat.v1.Session(config=config)
K.set_session(session)

At this point, I am quite lost and have no idea how to make Tensorflow and/or Keras use a limited number of cores as the server I am using is shared across the institute.

The server is running linux. However, I don't know which exact distribution/version it is. I am very new to running code on a server.

These are the versions I am using:

python == 3.10.8
tensorflow == 2.10.0
keras == 2.10.0

If you need any other information, I am happy to provide that.

Edit 1

Both the answer suggested in this thread doesn't work as well as using only these commands:

tf.config.threading.set_intra_op_parallelism_threads(5)
tf.config.threading.set_inter_op_parallelism_threads(5)

Does this answer your question? [How can I reduce the number of CPUs used by Tensorlfow/Keras?](https://stackoverflow.com/questions/57925061/how-can-i-reduce-the-number-of-cpus-used-by-tensorlfow-keras) — mujjiga, Feb 04 '23 at 11:33
Do you use a queueing system like slurm? if not, you should use it and also can limit the number of CPUs/GPUs exposed to the APIs. — Dr. Snoopy, Feb 04 '23 at 11:36
@Dr.Snoopy No, I don't use a queueing system. I thought I could simply run the code that works on my Windows Desktop on the server. Also, I am not the admin of the server but simply have access to the resources. Thus, I am not sure if I am allowed/have the rights to use something like slurm. — Lutz Köhler, Feb 04 '23 at 12:45
@mujjiga No, this didn't work in my case. I added that to the original question. — Lutz Köhler, Feb 04 '23 at 12:46

score 0 · Accepted Answer · answered Feb 06 '23 at 10:58

after trying some things, I have found a solution to my problem. With the following code, I can restrict the number of CPUs used:

os.environ["OMP_NUM_THREADS"] = "5"
tf.config.threading.set_intra_op_parallelism_threads(5)
tf.config.threading.set_inter_op_parallelism_threads(5)

Note, that I have no idea how many CPUs will be used in the end. I noticed that it isn't five cores being used but more. As I don't really care about the exact number of cores but just that I don't use all cores, I am fine with that solution for now. If anybody knows how to calculate the number of cores used from the information provided above, let me know.

Limit number of cores used on server for tensorflow 2 and keras

1 Answers1