I'm writing a python script where I use multiproccesing library to launch multiple tesseract instances in parallel. when I use multiple calls to tesseract but in sequence using loop ,it works .However ,when I try to parallel code everything looks fine but I'm not getting any results (I waited for 10 minutes ).
In my code I try to Ocrize multiple pdf pages after I split them from the original multi page PDF.
Here's my code :
def processPage(i):
nameJPG="converted-"+str(i)+".jpg"
nameHocr="converted-"+str(i)
p=subprocess.check_call(["tesseract",nameJPG,nameHocr,"-l","eng","hocr"])
print "tesseract did the job for the ",str(i+1),"page"
pool1=Pool(4)
pool1.map(processPage, range(len(pdf.pages)))