I am a beginner at python and I need to check the presence of a given set of string in a huge txt file. I've written this code so far and it runs with no problems on a light subsample of my database. The problem is that it takes more than 10 hours when searching through the whole database and I'm looking for a way to speed up the process.
The code so far reads a list of strings from a txt I've put together (list.txt) and search for every item in every line of the database (hugedataset.txt). My final output should be a list of items which are present in the database (or, alternatively, a list of items which are NOT present). I bet there is a more efficient way to do things though...
Thank you for your support!
import re
fobj_in = open('hugedataset.txt')
present=[]
with open('list.txt', 'r') as f:
list1 = [line.strip() for line in f]
print list1
for l in fobj_in:
for title in list1:
if title in l:
print title
present.append(title)
set=set(presenti)
print set