0

I try to run a Python script that trains several Neural Networks using TensorFlow and Keras. The problem is that I cannot restrict the number of cores used on the server, even though it works on my local desktop.

The basic structure is that I have defined a function run_net that runs the neural net. This function is called with different parameters in parallel using joblib (see below). Additionally, I have tried running the function iteratively with different parameters which didn't solve the problem.

Parallel(n_jobs=1, backend="multiprocessing")(
            delayed(run_net)

If I run that on my local Windows Desktop, everything works fine. However, if I try to run the same script on our institute's server with 48 cores and check CPU usage using htop command, all cores are used. I already tried setting n_jobs in joblib Parallel to 1 and it looks like CPU usage goes to 100% once the tensorflow models are trained.

I already searched for different solutions and the main one that I found is the one below. I define that before running the parallel jobs shown above. I also tried placing the code below before every fit or predict method of the model.

NUM_PARALLEL_EXEC_UNITS = 5
config = tf.compat.v1.ConfigProto(
    intra_op_parallelism_threads=NUM_PARALLEL_EXEC_UNITS,
    inter_op_parallelism_threads=2,
    device_count={"CPU": NUM_PARALLEL_EXEC_UNITS},
)
session = tf.compat.v1.Session(config=config)
K.set_session(session)

At this point, I am quite lost and have no idea how to make Tensorflow and/or Keras use a limited number of cores as the server I am using is shared across the institute.

The server is running linux. However, I don't know which exact distribution/version it is. I am very new to running code on a server.

These are the versions I am using:

python == 3.10.8
tensorflow == 2.10.0
keras == 2.10.0

If you need any other information, I am happy to provide that.

Edit 1

Both the answer suggested in this thread doesn't work as well as using only these commands:

tf.config.threading.set_intra_op_parallelism_threads(5)
tf.config.threading.set_inter_op_parallelism_threads(5)
  • 1
    Does this answer your question? [How can I reduce the number of CPUs used by Tensorlfow/Keras?](https://stackoverflow.com/questions/57925061/how-can-i-reduce-the-number-of-cpus-used-by-tensorlfow-keras) – mujjiga Feb 04 '23 at 11:33
  • Do you use a queueing system like slurm? if not, you should use it and also can limit the number of CPUs/GPUs exposed to the APIs. – Dr. Snoopy Feb 04 '23 at 11:36
  • @Dr.Snoopy No, I don't use a queueing system. I thought I could simply run the code that works on my Windows Desktop on the server. Also, I am not the admin of the server but simply have access to the resources. Thus, I am not sure if I am allowed/have the rights to use something like slurm. – Lutz Köhler Feb 04 '23 at 12:45
  • @mujjiga No, this didn't work in my case. I added that to the original question. – Lutz Köhler Feb 04 '23 at 12:46

1 Answers1

0

after trying some things, I have found a solution to my problem. With the following code, I can restrict the number of CPUs used:

os.environ["OMP_NUM_THREADS"] = "5"
tf.config.threading.set_intra_op_parallelism_threads(5)
tf.config.threading.set_inter_op_parallelism_threads(5)

Note, that I have no idea how many CPUs will be used in the end. I noticed that it isn't five cores being used but more. As I don't really care about the exact number of cores but just that I don't use all cores, I am fine with that solution for now. If anybody knows how to calculate the number of cores used from the information provided above, let me know.