Bear with me, I can't include my 1,000+ line program, and there are a couple of questions in the description.
So I have a couple types of patterns I am searching for:
#literally just a regular word
re.search("Word", arg)
#Varying complex pattern
re.search("[0-9]{2,6}-[0-9]{2}-[0-9]{1}", arg)
#Words with varying cases and the possibility of ending special characters
re.search("Supplier [Aa]ddress:?|Supplier [Ii]dentification:?|Supplier [Nn]ame:?", arg)
#I also use re.findall for the above patterns as well
re.findall("uses patterns above", arg
I have about 75 of these in total, and some need to be moved to deeply nested functions
When and where should I compile the patterns?
Right now I am trying to improve my program by compiling everything in main, then pass the correct list of compiled RegexObjects to the function that uses it. Would this increase my performance?
Would doing something like the following increase the speed of my program?
re.compile("pattern").search(arg)
Does the compiled patterns stay in memory so if a function is called multiple times with this in it would it skip the compiling part? So I wouldn't have to move data from function to function.
Is it even worth compiling all of the patterns if I move the data so much?
Is there a better way to match regular words without regex?
Short example of my code:
import re
def foo(arg, allWords):
#Does some things with arg, then puts the result into a variable,
# this function does not use allWords
data = arg #This is the manipulated version of arg
return(bar(data, allWords))
def bar(data, allWords):
if allWords[0].search(data) != None:
temp = data.split("word1", 1)[1]
return(temp)
elif allWords[1].search(data) != None:
temp = data.split("word2", 1)[1]
return(temp)
def main():
allWords = [re.compile(m) for m in ["word1", "word2", "word3"]]
arg = "This is a very long string from a text document input, the provided patterns might not be word1 in this string but I need to check for them, and if they are there do some cool things word3"
#This loop runs a couple million times
# because it loops through a couple million text documents
while True:
data = foo(arg, allWords)