I have a very large text file with duplicate entries which I want to eliminate. I do not care about the order of the entries because the file will later be sorted.
Here is what I have so far:
unique_lines = set()
outfile = open("UniqueMasterList.txt", "w", encoding = "latin-1")
with open("MasterList.txt", "r", encoding = "latin-1") as infile:
for line in infile:
if line not in unique_lines:
outfile.write(line)
unique_lines.add(line)
outfile.close()
It has been running for 30 minutes and has not finished. I need it to be faster. What is a faster approach in Python?