I am trying to find some typos in very big text files and correct them. Basicly, I run this code:
ocr = open("text.txt")
text = ocr.readlines()
clean_text = []
for line in text:
last = re.sub("^(\\|)([0-9])(\\s)([A-Z][a-z]+[a-z])\\,", "1\\2\t\\3\\4,", line)
clean_text.append(last)
new_text = open("new_text.txt", "w", newline="\n")
for line in clean_text:
new_text.write(line)
new_text.close()
In reality I use 're.sub' function more than 1500 times and 'text.txt' has 100.000 lines. Can I divide my text into pieces and use different cores for different parts?