I'm looking for advice on a better (faster) way to approach this. My problem is that as you increase the length of the "hosts" list the program takes exponentially longer to complete, and if "hosts" is long enough it takes so long for the program to complete that it seems to just lock up.
- "hosts" is a list of lists that contains tens of thousands of items. When iterating through "hosts" i[0] will always be an IP address, i[4] will always be a 5 digit number, and i[7] will always be a multi-line string.
- "searchPatterns" is a list of lists read in from a CSV file where elements i[0] through i[3] are regex search patterns (or the string "SKIP") and i[6] is a unique string used to identify a pattern match.
My current approach is to use the regex patterns from the CSV file to search through every multi-line list item contained in the "hosts" i[7] element. There are 100's of possible matches, and I need to identify all matches associated with each IP address and assign the unique string from the CSV file to identify all pattern matches. Finally, I need to put that information into the "fullMatchList" to use later.
NOTE: Even though each list item in "searchPatterns" has up to 4 patterns, I only need it to identify the first pattern found and then it can move on to the next list item to continue finding matches for that IP.
for i in hosts:
if i[4] == "13579" or i[4] == "24680":
for j in searchPatterns:
for k in range(4):
if j[k] == "SKIP":
continue
else:
match = re.search(r'%s' % j[k], i[7], flags=re.DOTALL)
if match is not None:
if tempIP == "":
tempIP = i[0]
matchListPerIP.append(j[4])
elif tempIP == i[0]:
matchListPerIP.append(j[4])
elif tempIP != i[0]:
fullMatchList.append([tempIP, matchListPerIP])
tempIP = i[0]
matchListPerIP = []
matchListPerIP.append(j[4])
break
fullMatchList.append([tempIP, matchListPerIP])
Here's an example regex search pattern from the CSV file:
(?!(.*?)\br2\b)cpe:/o:microsoft:windows_server_2008:
That pattern is intended to identify Windows Server 2008, and includes a negative lookahead to avoid matching the R2 edition.
I'm new to Python so any advice is appreciated! Thank you!