-5

I have two text files, file1.txt and file2.txt, that I need to compare and remove duplicate lines. The two files are not equal in size.

I've tried using filecmp and openfiles, but they do not work.

with open('crones.txt', 'r') as file1:
    with open('destino.txt', 'r+') as file2:
        lineas = file1.readlines()
        same = set(file1).intersection(file2)

        file2.close()
        file1.close()
        #file1 = open("crones.txt","w")
        #for linea in lineas:
        #    if linea!=same+"\n":
        #        f.write(linea)


        print same
        print lineas


#same.discard('\n')

#with open('some_output_file.txt', 'w') as FO:
 #   for line in same:
 #       FO.write(line)
newfurniturey
  • 37,556
  • 9
  • 94
  • 102
user3795147
  • 21
  • 1
  • 2
  • 2
    Please [edit] your question and post the code that you have tried. StackOverflow is not a code-writing service, but we may be able to help you solve whatever problems you may be experiencing. Be sure to include the full text of any error messages. – MattDMo Jul 01 '14 at 19:51
  • We cannot help if you don't include your code and explain exactly what you're trying to do and what does not work – Tim Jul 01 '14 at 19:51
  • Your code? What you have tried? Example of your input? – Shadow9043 Jul 01 '14 at 19:52
  • @dawg [No lmgtfy links please. They're banned for a reason.](http://meta.stackexchange.com/questions/15650/ban-lmgtfy-let-me-google-that-for-you-links) – thegrinner Jul 01 '14 at 19:58
  • @user3795147 Almost there. Now add two sample files file1.txt and file2.txt and the actual results to your question. – Jan Doggen Jul 02 '14 at 12:47

1 Answers1

0

This may be what you're looking for:

with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
    a = f1.readlines()
    b = f2.readlines()

non_duplicates = [line for line in a if line not in b]
non_duplicates += [line for line in b if line not in a]

non_duplicates is now a list that contains all of the lines from each file that do not occur in both files. If you're worried about duplicates within each file as well, you can add this:

non_duplicates = list(set(non_duplicates))

Some other notes:

  1. Listen to the comments posted here. You will get more answers if you follow the guidelines posted in the stackoverflow help section on asking good questions. Specifically, make sure you make an attempt to write your code first. If it doesn't work, post EXPLICIT error messages and describe your input that led to them.
  2. You should do some reading on the with statement with a particular emphasis on opening files with it.
  3. It's good that you posted some code, but get rid of the commented-out clutter. We don't want to see your messy, hacked up, in-progress code. The code you post should be a minimal, complete, and verifiable example that demonstrates the issue you're having.
Community
  • 1
  • 1
skrrgwasme
  • 9,358
  • 11
  • 54
  • 84