0

file.txt file2.txt

I'm trying to make a Python script that will do the following:

Read file.txt and file2.txt If there is something that is inside of file.txt which is also inside of file2.txt, remove it from file.txt

This is what I've done:

file1 = open('file.txt', 'r').readlines()
file2 = open('file2.txt', 'r').readlines()
Removed = open('Removed.txt', 'a')
for line in file2:
    if line not in file1:
        Removed.write(str(line) + '\n')
Removed.close()

It just removed it from file2.txt

martineau
  • 119,623
  • 25
  • 170
  • 301
Zenzureal
  • 25
  • 5
  • 2
    StackOverflow is not a coding service. Please read the following documentation, then [edit] and rephrase the question. [Take the Tour](https://stackoverflow.com/tour) & [How to ask a good question](https://stackoverflow.com/help/how-to-ask). Always [Provide a Minimal, Reproducible Example (e.g. code, data, errors) as text](https://stackoverflow.com/help/minimal-reproducible-example) & you're expected to [try to solve the problem first](https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users). – Trenton McKinney Jun 20 '20 at 22:06
  • Merge two files, sort them and write unique entries into a new. – Olvin Roght Jun 20 '20 at 22:06
  • Do you need to keep the lines in the same order? – Mark Ransom Jun 20 '20 at 22:18
  • @MarkRansom basically I wan t it to check if there is something that is inside of file.txt which is also inside of file2.txt, make a new file named removed.txt for example and write everything inside of file1 but like if there is anything inside of file2 remove it from file1 and write into a new file – Zenzureal Jun 20 '20 at 22:22
  • @MarkRansom I don't mean like remove the from the file itself not physically just mentally and rewrite it without it – Zenzureal Jun 20 '20 at 22:22
  • [Compare two files report difference in python](https://stackoverflow.com/questions/19120489/compare-two-files-report-difference-in-python) – Trenton McKinney Jun 20 '20 at 22:28

3 Answers3

1

If you don't care about the order of the lines in file.txt after cleaning the duplicate data, you could use a set difference between the two file's lines (after converting them to set objects). Try something as follows:

with open('f1.txt') as f1, open('f2.txt') as f2:
    f1_lines = set(f1.read().splitlines())
    f2_lines = set(f2.read().splitlines())

new_data = f1_lines - f2_lines

with open('f1.txt', 'w') as f1:
    f1.write("\n".join(new_data))
revliscano
  • 2,227
  • 2
  • 12
  • 21
0

The code you have is fine, but it will be very slow. The line if line not in file1: will do a linear scan of the list for every line in file2. If you make it a set things will go much faster, because sets don't need to do a linear scan.

file1 = set(open('file.txt', 'r').readlines())

You will also want to close the files after you've read them, or better yet use a with statement so they get automatically closed.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • It works but it doesn't do what I'm aiming for, I want to write every single thing in file1 apart from if any line is in file2. This code just prints the duplicate in between the files – Zenzureal Jun 20 '20 at 22:30
  • @Zenzureal then fix your original code, because it doesn't do what you want. – Mark Ransom Jun 20 '20 at 23:02
0

This is what you're looking for... it also keeps the line orders:

with open('file.txt') as f1, open('file2.txt') as f2:
    file1_lines = (f1.read().splitlines())
    file2_lines = (f2.read().splitlines())

for line in file1_lines:
    if line in file2_lines:
        file1_lines.remove(line)
    
with open('Removed.txt', 'w') as f3:
    f3.write("\n".join(file1_lines))

to check if there is something that is inside of file.txt which is also inside of file2.txt, make a new file named removed.txt for example and write everything inside of file1 but like if there is anything inside of file2 remove it from file1 and write into a new file – @Zenzureal

Peyman Majidi
  • 1,777
  • 2
  • 18
  • 31