Parallel processing in Python: options and alternative

Question

I tried joblib, however, I got stuck at setting the processor affinity as explained here (the error is shown below along with my script).

Now I want to know if there are other options or alternatives that would allow me to accomplish the same goal, which is to run the same script in parallel, using my 8 cores (in a fashion that resembles GNU parallel).

Error:

AttributeError: 'Process' object has no attribute 'set_cpu_affinity'

My script:

from datetime import datetime
from subprocess import call 
from joblib import Parallel, delayed
import multiprocessing
import psutil
import os

startTime = datetime.now()

pdb_name_list = []
for filename in os.listdir('/home/labusr/Documents/python_scripts/Spyder/Refinement'):
   if filename.endswith(".pdb"):     
       pdb_name_list.append(filename)

num_cores = multiprocessing.cpu_count()
p = psutil.Process(os.getpid())
p.set_cpu_affinity(range(num_cores))
print(p.get_cpu_affinity())

inputs = range(2) 

def strcuture_refine(file_name,i):
    print('Refining strcuture %s round %s......\n' %(file_name, i))
    call(['/home/labusr/rosetta/main/source/bin/rosetta_scripts.linuxgccrelease',
          '-in::file::s', '/home/labusr/Documents/python_scripts/Spyder/_Refinement/%s' %file_name,
          '-parser::protocol', '/home/labusr/Documents/A_asymm_refine.xml',
          '-parser::script_vars', 'denswt=35', 'rms=1.5', 'reso=4.3', 'map=/home/labusr/Documents/tubulin_exercise/masked_map_center.mrc', 
          'testmap=/home/labusr/Documents/tubulin_exercise/mmasked_map_top_centered_resampled.mrc',
          '-in:ignore_unrecognized_res',
          '-edensity::mapreso', '4.3',
          '-default_max_cycles', '200',
          '-edensity::cryoem_scatterers',
          '-beta',
          '-out::suffix', '_%s' %i,
          '-crystal_refine'])
    print('Time for refining %s round %s is: \n' %(file_name, i), datetime.now() - startTime)

for file_name in pdb_name_list:
    Parallel(n_jobs=num_cores)(delayed(strcuture_refine)(file_name,i) for i in inputs)

Your question does not explain why you do not use GNU Parallel. Can you elaborate on that? https://oletange.wordpress.com/2018/03/28/excuses-for-not-installing-gnu-parallel/ — Ole Tange, Apr 03 '18 at 06:46
Actually I can use and did use GNU parallel, but I want to run this script on a remote machine, and I rather handle this using python, and not parallel. — ahmadkhalifa, Apr 03 '18 at 17:06
I believe I can understand wanting to do everything yourself. But I do not understand why running on a remote machine would be a barrier given https://oletange.wordpress.com/2018/03/28/excuses-for-not-installing-gnu-parallel/ Can you specify what stops you from using GNU Parallel on the remote machine? — Ole Tange, Apr 04 '18 at 07:19
Slurm does not prevent you from using GNU Parallel (do a search for 'slurm gnu parallel' to see how). Are there any other reasons why you do not use GNU Parallel? — Ole Tange, Apr 10 '18 at 21:23

score 0 · Answer 1 · answered Mar 27 '18 at 20:33

0

The simplest thing to do is to just launch multiple Python processes from e.g. a command line. To make each process handle its own file, you can pass it when invoking Python:

python myscript.py filename

The passed filename is then available in Python via

import sys
filename = sys.argv[1]

answered Mar 27 '18 at 20:33

jmd_dk

12,125
9
63
94

This is too simplistic I'm afraid. I want to program it. – ahmadkhalifa Mar 28 '18 at 14:44
You could write a bash script which launches the Python processes – jmd_dk Mar 28 '18 at 14:52

Parallel processing in Python: options and alternative

1 Answers1