I searched on here an found many postings, however none that I can implement into the following code
with open('TEST.txt') as f:
seen = set()
for line in f:
line_lower = line.lower()
if line_lower in seen and line_lower.strip():
print(line.strip())
else:
seen.add(line_lower)
I can find the duplicate lines inside my TEST.txt file which contains hundreds of URLs.
However I need to remove these duplicates and create a new text file with these removed and all other URLs intact.
I will be Checking this newly created file for 404 errors using r.status_code.
In a nutshell I basically need help getting rid of duplicates so I can check for dead links. thanks for your help.