I am working on a dual-processor windows machine and am trying to run several independent python processes using the multiprocessing library. Of course, I am aiming to maximize the use of both CPU's in order to speed up computation time. The details of my machine are below:
- OS: Windows 10 Pro for Workstations
- RAM: 524 GB
- Hard Drive: Samsung SSD PRO 960 (NVMe)
- CPU: Xeon Gold 6154 (times 2)
I execute a master-script using Python 3.6, which then spawns 72 memory-independent workers using the multiprocessing library. Initially, all 72 cores of my machine are used at 100%. After about 5-10 minutes, however, all 36 of the cores on my second CPU reduce to 0% usage, while the 36 cores on the first CPU remain at 100%. I can't figure out why this is happening.
Is there something I am missing regarding the utilization of both CPU's in a dual-processor Windows machine? How can I ensure that the full potential of my machine is utilized? As a side note, I'm curious if this would be different if I were using a Linux OS? Thank you in advance for anyone who is willing to help with this.
A representation of my python master script is below:
import pandas as pd
import netCDF4 as nc
from multiprocessing import Pool
WEATHERDATAPATH = "C:/Users/..../weatherdata/weatherfile_%s.nc4"
OUTPUTPATH = "C:/Users/....outputs/result_%s.nc4"
def calculationFunction(year):
dataset = nc.Dataset(WEATHERDATAPATH%year)
# Read the data
data1 = dataset["windspeed"][:]
data2 = dataset["pressure"][:]
data3 = dataset["temperature"][:]
timeindex = nc.num2date(dataset["time"][:], dataset["time"].units)
# Do computations with the data, primarily relying on NumPy
data1Mean = data1.mean(axis=1)
data2Mean = data2.mean(axis=1)
data3Mean = data3.mean(axis=1)
# Write result to a file
result = pd.DataFrame( {"windspeed":data1Mean,
"pressure":data2Mean,
"temperature":data3Mean,},
index=timeindex )
result.to_csv(OUTPUTPATH%year)
if __name__ == '__main__':
pool = Pool(72)
results = []
for year in range(1900,2016):
results.append( pool.apply_async(calculationFunction, (year, )))
for r in results: r.get()