If the file2 is not too big create a set of all the lines, split the file1 lines and check if the second element is in the set of lines:
import fileinput
import sys
with open("file2.txt") as f:
lines = set(map(str.rstrip,f)) # itertools.imap python2
for line in fileinput.input("file1.txt",inplace=True):
# if FILENAME1 etc.. is not in the line write the line
if line.rstrip().split(";")[1] not in lines:
sys.stdout.write(line)
file1:
LINK1;FILENAME1
LINK2;FILENAME2
LINK3;FILENAME3
LINK1;FILENAME4
LINK2;FILENAME5
LINK3;FILENAME6
file2:
FILENAME1
FILENAME2
FILENAME3
file1 after:
LINK1;FILENAME4
LINK2;FILENAME5
LINK3;FILENAME6
fileinput.input
with inplace changes the original file. You don't need to store the lines in a list.
You can also write to a tempfile, writing the unique lines to it and using shutil.move to replace the original file:
from tempfile import NamedTemporaryFile
from shutil import move
with open("file2.txt") as f, open("file1.txt") as f2, NamedTemporaryFile(dir=".",delete=False) as out:
lines = set(map(str.rstrip,f))
for line in f2:
if line.rstrip().split(";")[1] not in lines:
out.write(line)
move(out.name,"file1.txt")
If your code errors you won't lose any data in the original file using a tempfile.
using a set to store the lines means we have on average 0(1) lookups, storing all the lines in a list would give you a quadratic as opposed to a linear solution which for larger files would give you a significantly more efficient solution. There is also no need to store all the lines of the other file in a list with readlines as you can write as you iterate over the file object and do your lookups.