multiprocessing with SLURM, increasing number of cpus-per-ask works but not increasing number of tasks

Question

Whether i specify --ntasks=3 and --cpus-per-task=40 or --ntasks=1 and --cpus-per-task=40 (SLURM), the code takes the exact same time (99 seconds) to run. What am I missing?

I do witness a speed up when going from --cpus-per-task=20 to --cpus-per-task=40 (194 seconds vs 99 seconds which makes sense (two fold decrease in time when putting twice as many CPUs)!).

I do have 40 CPUs per node available.

Here is my MRE:

import multiprocessing as mp
import openpyxl
import os
import time
from multiprocessing import Lock


def write_to_excel(workbook, sheet_name, row, col, data, mylock):
    # just some stuff to make the calculation last a long time
    for k in range(15_000):
        for j in range(15_000):
            a = k + j
            if a % 2 == 0:
                a = a + 1
            else:
                a = a - 1
            if a is None:
                print(a)
    with mylock:
        # Open the shared workbook in read-write mode
        wb = openpyxl.load_workbook(workbook)
        # Get the sheet
        sheet = wb[sheet_name]
        # Write the data to the specified cell
        sheet.cell(row=row, column=col, value=data)
        # Save the changes to the workbook
        wb.save(workbook)


if __name__ == "__main__":
    start_time = time.time()
    # Create a new Excel workbook
    wb = openpyxl.Workbook()
    wb.save("shared_workbook.xlsx")

    mylock = Lock()

    # Get the number of tasks and CPUs per task from environment variables
    num_tasks = int(os.getenv("SLURM_NTASKS", 1))
    cpus_per_task = int(os.getenv("SLURM_CPUS_PER_TASK", 1))

    print(f"num_tasks: {num_tasks}")  # output is coherent with my slurm script
    print(f"cpus_per_task: {cpus_per_task}")  # output is coherent with my slurm script

    # Calculate the total number of processes
    num_processes = num_tasks * cpus_per_task
    print(f"num_processes: {num_processes}")  # output is coherent with my slurm script

    # Number of parallel processes to create
    num_processes_to_have = 102

    # Start the processes
    processes = []
    for i in range(num_processes_to_have):
        process = mp.Process(
            target=write_to_excel,
            args=(
                "shared_workbook.xlsx",
                "Sheet",
                i + 1,
                1,
                f"Data from process {i + 1}",
                mylock,
            ),
        )
        processes.append(process)
        process.start()

    # Wait for all processes to finish
    for process in processes:
        process.join()

    print("Writing to shared workbook complete.", time.time() - start_time)

My slurm script looks like this:

#SBATCH --job-name=#####
#SBATCH --output=#####
#SBATCH --time=1:00:00
#SBATCH --mem=8G
#SBATCH --partition=#####
#SBATCH --mail-user=#####
#SBATCH --mail-type=#####
#SBATCH --export=NONE
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=20

score 1 · Accepted Answer · answered Mar 21 '23 at 10:34

1

If my assumption is correct your statement I do have 40 CPUs per node available answers your question. I assume you are running your experiments on a single node.

Whether i specify --ntasks=3 and --cpus-per-task=40 or --ntasks=1 and --cpus-per-task=40 (SLURM), the code takes the exact same time (99 seconds) to run. What am I missing?

Here total number of processes created will be 120 (3tasks * 40 cpus-per-task) and 40 (1 task * 40 cpus-per-task). In effect since your node has a capacity of 40 cores, you cannot improve your performance if you increase the number of processes > number of cores.

Why? Because once you have 120 processes running on 40 cores, there needs to be lot of context switching to execute your code (Hence the performance improvement won't be much if your code is not optimised) with respect to 40 processes running on 40 cores (This also depends on the application - if you have a master worker model you might see some improvement but not humongous).

answered Mar 21 '23 at 10:34

j23

3,139
1
6
13

Isn't a new task supposed to get a new node, otherwise what's the point of it (compared to `--cpus-per-task`)? And how can I use several nodes then? Is this not possible with the library `multiprocessing`? – FluidMechanics Potential Flows Mar 21 '23 at 15:11
No, it is not trivial. With multiprocessing you can only use it on a single machine. You need to write/modify your code for distributed systems to see useful results. [See this SO thread](https://stackoverflow.com/questions/5181949/using-the-multiprocessing-module-for-cluster-computing) – j23 Mar 21 '23 at 17:22
What is the point of `--ntasks=3` then? If we can get all the cpus with `--ntasks=1`? – FluidMechanics Potential Flows Mar 21 '23 at 17:33
1

with `--ntasks=1` we cannot run more than one task in parallel. For example if we use `srun sleep 10 & srun sleep 10` both task wont run in parallel if `--ntasks `is 1. So you have to use `--ntasks=2` in job script and then use `srun --ntasks=1 sleep 10 & srun --ntasks=1 sleep 10` to run it in parallel. Basically, it specifies the maximum number of tasks that can be launched in parallel. – j23 Mar 22 '23 at 10:01
1

Super clear, so basically `--ntasks` is useless to me if I'm only using `multiprocessing`. – FluidMechanics Potential Flows Mar 23 '23 at 10:27

multiprocessing with SLURM, increasing number of cpus-per-ask works but not increasing number of tasks

1 Answers1