All, I am rather new and am looking for assistance. I need to perform a string search on a data set that compressed is about 20 GB of data. I have an eight core ubuntu box with 32 GB of RAM that I can use to crunch through this but am not able to implement nor determine the best possible code for such a task. Would Threading or multiprocessing be best for such a task? Please provide code samples. Thank you. Please see my current code;
#!/usr/bin/python
import sys
logs = []
iplist = []
logs = open(sys.argv[1], 'r').readlines()
iplist = open(sys.argv[2], 'r').readlines()
print "+Loaded {0} entries for {1}".format(len(logs), sys.argv[1])
print "+Loaded {0} entries for {1}".format(len(iplist), sys.argv[2])
for a in logs:
for b in iplist:
if a.lower().strip() in b.lower().strip()
print "Match! --> {0}".format(a.lower().strip())