I'm new to the universe of multi-threading and I'm still not sure if I understood it right.
I'm trying to execute a command on a large set of files, but instead of running the desired command one at a time (or all of them), I would like to run the command X number of times, (X being a number of threads I determine).
The problem I'm having is that even if I specify a thread value of 1, all the files are processed at the same time, using 100% CPU (which I'm trying to avoid)
Here is the code :
import multiprocessing
cpu=3
def actual_command(filename):
bash_command1="samtools mpileup -f folder/ref_genome.fasta -u {}".format(filename)
bash_command2="bcftools call -mv > {}.vcf".format(filename.split('.')[0]))
com=bash_command1+"|"+bash_command2
subprocess.Popen(calling,shell=True)
def processing():
list_files=[]
with open(saved_files) as f:
for line in f :
list_files.append(line.strip())
p = multiprocessing.Pool(processes=int(cpu))
p.map(actual_command,list_files)
p.close()
p.join()
Since Pool let me choose the number of the CPU I want python to use for a specific part of the code I was expecting the script to process the whole list of files, 3 at the time but it seems that I'm doing something incorrect and so I may use some help.
Thanks