I have here some lines of code from the beginning of my OCR program. I can see with the Time() function that these few lines take 90% of the time of a run. Unfortunately, I have no more idea how to develop these lines more efficiently in terms of time. What would be your approaches to speed up this process?
for page_number,page_data in enumerate(doc):
txt = pytesseract.image_to_string(page_data,lang='eng').encode('utf-8')
Counter = 0
txt = txt.decode('utf-8')
tokens = txt.split()
for i in tokens:
ResultpageNumber.append([page_number+1,tokens[Counter],Counter])
Counter=Counter+1