I'm new to python, and especially new to multiprocessing/multithreading. I have trouble reading the documentation, or finding a sufficiently similar example to work off of.
The part that I am trying to divide among multiple cores is italicized, the rest is there for context. There are three functions that are defined elsewhere in the code, NextFunction(), QualFunction(), and PrintFunction(). I don't think what they do is critical to parallelizing this code, so I did not include their definitions.
Can you help me parallelize this?
So far, I've looked at https://docs.python.org/2/library/multiprocessing.html
Python Multiprocessing a for loop
and I've tried the equivalents for multithreading, and I've tried ipython.parallel as well.
The code is intended to pull data from a file, process it through a few functions and print it, checking for various conditions along the way.
The code looks like:
def main(arg, obj1Name, obj2Name):
global dblen
records = fasta(refName)
for i,r in enumerate(records):
s = r.fastasequence
idnt = s.name.split()[0]
reference[idnt] = s.seq
names[i] = idnt
dblen += len(s.seq)
if taxNm == None: taxid[idnt] = GetTaxId(idnt).strip()
records.close()
print >> stderr, "Read it"
# read the taxids
if taxNm != None:
file = open(taxNm, "r")
for line in file:
idnt,tax = line.strip().split()
taxid[idnt] = tax
file.close()
File1 = pysam.Samfile(obj1Name, "rb")
File2 = pysam.Samfile(obj2Name, "rb")
***for obj1s,obj2s in NextFunction(File1, File2):
qobj1 = []
qobj2 = []
lobj1s = list(QualFunction(obj1s))
lobj2s = list(QualFunction(obj2s))
for obj1,ftrs1 in lobj1s:
for obj2,ftrs2 in lobj2s:
if (obj1.tid == obj2.tid):
qobj1.append((obj1,ftrs1))
qobj2.append((obj2,ftrs2))
for obj,ftrs in qobj1:
PrintFunction(obj, ftrs, "1")
for obj,ftrs in qobj2:
PrintFunctiont(obj, ftrs, "2")***
File1.close()
File2.close()
And is called by
if __name__ == "__main__":
etc