3

I would like to compare two text files which have three columns each. One file has 999 rows and another has 757 rows. I want the different 242 rows to be stored in a different file. I created the first file (999 rows) using a random network generator (999 rows are edges with third column being weight between first, second columns - source, destination nodes).

File Format - Files 1, 2

1 3 1
16 36 1

I have tried

Compare two files line by line and generate the difference in another file and find difference between two text files with one item per line and http://www.daniweb.com/software-development/python/threads/124932/610058#post610058

Neither worked for me.

I think it is a problem of string comparison. I would like to compare the numbers in first column and second column. If they both are different, I want to write it to third file.

Any help will be much appreciated!

Update

I am posting the following code that I tried after @MK posted his comment.

f = open("results.txt","w")

for line in file("100rwsnMore.txt"):
    rwsncount += 1
    line = line.split()
    src = line[0]
    dest = line[1]
    for row in file("100rwsnDeleted.txt"):
        row = row.split()
        s = row[0]
        d = row[1]
        if(s != src and d != dest):
             f.write(str(s))
             f.write(' ')
             f.write(str(d))
             f.write('\n')

f.close()
martineau
  • 119,623
  • 25
  • 170
  • 301
learner
  • 150
  • 3
  • 16

1 Answers1

7

The best general-purpose option if you're on a *nix system is just to use:

sort filea fileb | uniq -u

But if you need to use Python:

Your code reopens the inner file in every iteration of the outer file. Open it outside the loop.

Using a nested loop is less efficient than looping over the first storing the found values, and then comparing the second to those values.

def build_set(filename):
    # A set stores a collection of unique items.  Both adding items and searching for them
    # are quick, so it's perfect for this application.
    found = set()

    with open(filename) as f:
        for line in f:
            # [:2] gives us the first two elements of the list.
            # Tuples, unlike lists, cannot be changed, which is a requirement for anything
            # being stored in a set.
            found.add(tuple(sorted(line.split()[:2])))

    return found

set_more = build_set('100rwsnMore.txt')
set_del = build_set('100rwsnDeleted.txt')

with open('results.txt', 'w') as out_file:
   # Using with to open files ensures that they are properly closed, even if the code
   # raises an exception.

   for res in (set_more - set_del):
      # The - computes the elements in set_more not in set_del.

      out_file.write(" ".join(res) + "\n")      
Zack Bloom
  • 8,309
  • 2
  • 20
  • 27
  • Thank you the reply Zack. not every line from 100rwsnDeleted file is deleted. The line count is 350 as supposed to 242. I created the 100rwsnDeleted file from 100rwsnMore file initially by comparing 'More' file with another file (with 242 rows). I just wanted to make it clear that all rows in 100rwsnDeleted file are derived from 100rwsnMore file. – learner Oct 13 '11 at 17:58
  • @bhanu I just made a fix which should cause it to operate the way you expect, give it a shot. – Zack Bloom Oct 13 '11 at 18:03
  • I tried it. It added too many lines. The count is 942. I will try uploading the text files. – learner Oct 13 '11 at 18:17
  • Ah, I think I see, I changed the operator in the set comparison, give it a shot. – Zack Bloom Oct 13 '11 at 18:20
  • Nothing changed. I must mention that if the larger file has 11 86 1 as a row and second file has 86 11 1 as a row, it must be considered as common edge and should not figure in the results.txt file. – learner Oct 13 '11 at 18:23
  • In the sense that, the edge is bidirectional and not unidirectional (graphs). These rows are essentially edges in a network represented as a graph. *If it was not clear to you already*. – learner Oct 13 '11 at 18:24
  • Yes, that would be important to mention. I added a sorting to the numbers so they will compare correctly irrespective of order. – Zack Bloom Oct 13 '11 at 18:25
  • Thank you Zack, sorry for not explicitly mentioning it before. You've been of great help! It worked!! Thank you :) – learner Oct 13 '11 at 18:28