1

I'm new to the universe of multi-threading and I'm still not sure if I understood it right.

I'm trying to execute a command on a large set of files, but instead of running the desired command one at a time (or all of them), I would like to run the command X number of times, (X being a number of threads I determine).

The problem I'm having is that even if I specify a thread value of 1, all the files are processed at the same time, using 100% CPU (which I'm trying to avoid)

Here is the code :

import multiprocessing

cpu=3

def actual_command(filename):
    bash_command1="samtools mpileup -f folder/ref_genome.fasta -u {}".format(filename)
    bash_command2="bcftools call -mv > {}.vcf".format(filename.split('.')[0]))
    com=bash_command1+"|"+bash_command2
    subprocess.Popen(calling,shell=True)

def processing():

    list_files=[]
    with open(saved_files) as f:
    for line in f :
        list_files.append(line.strip())
    p = multiprocessing.Pool(processes=int(cpu))
    p.map(actual_command,list_files)
    p.close()
    p.join()

Since Pool let me choose the number of the CPU I want python to use for a specific part of the code I was expecting the script to process the whole list of files, 3 at the time but it seems that I'm doing something incorrect and so I may use some help.

Thanks

Rei
  • 329
  • 2
  • 10
  • Take a look at [this question](https://stackoverflow.com/questions/2837214/python-popen-command-wait-until-the-command-is-finished), the problem is not in the multiprocessing, but that you use `Popen` which does not wait for completion by default – FlyingTeller Jul 11 '18 at 14:32
  • Multiprocessing is not threading, and you aren't actually choosing the number of cpus, but the number of independent interpreter processes to start. The number of cpus used is up to your operating system – juanpa.arrivillaga Jul 11 '18 at 15:53
  • Does the problem also lie in the missing return statement? – Carsten Aug 20 '19 at 11:05

0 Answers0