-4

I have two text files that have data like the example below. Using python, I want to save to a new file, the lines of the first line that are not appeared in the second file. The lines are not included in the files.

-----line----fist file ---------- second file

   1   116969        116969
   2   116973        116977
   3   116977        117000
   4   117000        117028
   5   117004        117032
   6   117028        117036
   7   117032        117066
   8   117036        117104
   9   117062
   10  117066
   11  117097
   12  117104

The new file must have this structure.

2
5
9
11

Edit: This is my code so far. The list sendfiles contains the paths to the files that have original data. The list receivefiles contains the paths to the files that have less data.

    for x in range(0, len(sendfiles)):      
       f1 = open(sendfiles[x], 'r').readlines()
       f2 = open(receivefiles[x], 'r').readlines()
       path = sendfiles[x].strip('send.txt')
       final_file = path + 'out.txt'
       with open(final_file,'w') as f:
           ...

In unix, I use diff -f command but know I must write some python code to do this.

Edit2: This is and example of the send.txt and this is an example of the receive.txt

zinon
  • 4,427
  • 14
  • 70
  • 112
  • possible duplicate of [Compare two different files line by line and write the difference in third file - Python](http://stackoverflow.com/questions/7757626/compare-two-different-files-line-by-line-and-write-the-difference-in-third-file) – fredtantini Sep 26 '14 at 12:37
  • Check [difflib](https://docs.python.org/2/library/difflib.html) – sk11 Sep 26 '14 at 12:50
  • 1
    This isn't a question, it's a specification. You're asking us to write a program for you. That's not what this site is for. – Bryan Oakley Sep 26 '14 at 12:53
  • @BryanOakley I'm sorry for not writing my program. No, I do not ask to write me the program but I NEED help! That's all! – zinon Sep 26 '14 at 12:56
  • 1
    @zinon: but what kind of help do you need! Do you know how to program at all? Do you know python? Do you know how to read a file? Do you know how to compare strings? Do you know how to write to a file? Do you know how to count line numbers? – Bryan Oakley Sep 26 '14 at 12:59
  • @BryanOakley Yes I know that! I just need a response for my question! Anyway, if you don't want to give me a hand, forget it. Thanks! – zinon Sep 26 '14 at 13:04
  • @fredtantini I tried it but it does not do what I want to. – zinon Sep 26 '14 at 13:05
  • @zinon: I'm just trying to understand what you need help with. You claim to know how to do all the things this project requires, and you claim you don't want us to write the program for you, so how are we supposed to help? You haven't told us what part of the problem you are struggling with. – Bryan Oakley Sep 26 '14 at 13:13
  • @BryanOakley I'm really sorry! Please, see my edited question. – zinon Sep 26 '14 at 13:19
  • In comments to other answers you keep saying "this is too time consuming". Apparently your data set is very big. How big is it? Is the real problem that you don't know how to do this _efficiently_ for large data sets? – Bryan Oakley Sep 26 '14 at 13:41
  • @BryanOakley Please the my second edit for the files. The original file has 155112 lines. I have many of these files to compare. – zinon Sep 26 '14 at 13:45

1 Answers1

1
f1 = open('a', 'r').readlines()
f2 = open('b', 'r').readlines()
out = []
count = 1 
for i in f1:
    flag = False
    for j in f2:
        if i == j:
            flag = True
    if not flag:
        out.append(count)
    count+=1
for o in out:
    print o

optimized one

f1 = open('a', 'r').readlines()
f2 = open('b', 'r').readlines()
out = []
indexa = 0
indexb = 0
out = []
while(1):
    try:
        if f1[indexa][:-1] ==  f2[indexb][:-1]:
            indexa +=1
            indexb +=1
        elif f1[indexa][:-1] > f2[indexb][:-1]:
            indexb += 1
        elif f1[indexa][:-1] < f2[indexb][:-1]:
            out.append(indexa+1)
            indexa += 1
    except IndexError:
        break
for i in out:
    print i
gaurav
  • 43
  • 6
  • @guarav Also this is very time consuming! Thanks anyway! – zinon Sep 26 '14 at 13:33
  • This answer could be better with some explanation. – Keith Pinson Sep 26 '14 at 13:46
  • @guarav Can you please briefly explain me the if-elif loop idea? I mean your idea about indexa and indexb. Thank you in advance! – zinon Sep 26 '14 at 13:56
  • basically indexa and indexb are two pointers which are used to track the current line number in both the files. if value in both lines are same, both the indexes can be incremented. If first file value is greater then we can increment indexb so that both become equal, or else increment indexa and also you need to store the value in a array so that the diff line can be printed – gaurav Sep 26 '14 at 18:58