0

A list of IP addresses are downloaded to a file and rename to Old_file. As days goes by device get update with more IPs(or deleted). Therefore, I download a new list of IP addresses to another file named as New_file

I then want to run a compare these two files and see what is not matching

Old_file = [1.1.1.1, 1.1.1.2, 1.1.1.3, 1.1.1.4, 1.1.1.6,]

new_file = [1.1.1.1, 1.1.1.2, 1.1.1.3, 1.1.1.5, 1.1.1.6] return needs to 1.1.1.4, and stops there. But never from Old_file e.g: 1.1.1.5 (we need the results only from the New_file only) I really hope this would explain.

Thanks in advance Tony

tony2019
  • 15
  • 4
  • do both files have equal length (equal number of lines)? I'd check that first... also, do you just want to do a line-by-line comparison or do you want to check if a line in the reference exists *anywhere* in the content to check? – FObersteiner Sep 25 '19 at 09:51
  • Your question is unclear. What does your data look like? (your examples are not valid Python structures). Is is just text? In this case, what do you mean by 'return not match first result'? The first non-matching character? The first non-matching line? – Thierry Lathuille Sep 25 '19 at 09:56
  • 1
    Take note that python has a standard library module for comparisons/diffs: https://docs.python.org/3/library/difflib.html – MisterMiyagi Sep 25 '19 at 09:58
  • @MrFuppes files are not the same in length, and yes I want to do a lin-by-line comparison and stop at the point where the new_file doesn't hold the result comparing it to the old_file (this is because, the device gets updated every day. What's on today on the device will be removed tomorrow/week later or year later). and I want to find that one – tony2019 Sep 25 '19 at 10:29
  • @Thierry Lathuille, apologies for not been clear. What I am trying to do compare a list of IPs which are saved to a text file (Old_file) to the current device IP list (New_file) and find out whether there is a free IP on the device E.g Old_file = 1.1.1.1 to 1.1.1.10 New_file = 1.1.1.1 to 1.1.1.3 then 1.1.1.5 to 1.1.1.7 and 1.1.1.9 to 1.1.1.11. My Search should stop at 1.1.1.4 and return 1.1.1.4. I really hope this explains (if not, pleas accept my apologies ) – tony2019 Sep 25 '19 at 10:29
  • You should edit your question and explain what your real problem is: what your files look like exactly, and what you want. Do you want a list of IPs that are present in one file (which one, then) and not in the other one? Are the files sorted? And so on. – Thierry Lathuille Sep 25 '19 at 10:31
  • @tony2019: ok, added a simple line-by-line comparison option to my answer. Although in general I think one should not overcomplicate things, I think ThierryLathuille is asking the right questions... Somethimes it's better to expect things to be complicated ;-) – FObersteiner Sep 25 '19 at 10:39
  • @All, as stated many times, I will edit my question, and do my best to explain. – tony2019 Sep 25 '19 at 11:03

2 Answers2

1

For a simple element-wise comparison, you could do

def get_first_unequal(s0, s1):   
    for e0, e1 in zip(s0, s1): # assumes sequences are of equal length!
        if e0 != e1:
            print(f"unequal elements: '{e0}' vs. '{e1}'!")
            return (e0, e1)
    return None # all equal

a = ['a', 'b', 'c']
b = ['a', 'b', 'd']             
get_first_unequal(a, b)            
# unequal elements: 'c' vs. 'd'!  
# ('c', 'd')

# --> to get a list of all unequal pairs, you could also use
# [(e0, e1) for (e0, e1) in zip(s0, s1) if e0 != e1]

If you want to go more sophisticated, as mentioned in the comments, difflib might be your way to go. to run e.g. a comparison of two sequences (which are the list of strings you read from the two txt files you want to compare):

import difflib
a = ['a', 'b', 'c']
b = ['s', 'b', 'c', 'd']
delta = difflib.context_diff(a, b)
for d in delta:
    print(d)

gives

*** 1,3 ****
! a
  b
  c
--- 1,4 ----
! s
  b
  c
+ d

to check the difference between two strings, you could do something like (borrowing from here):

a = 'string1'
b = 'string 2'
delta = difflib.ndiff(a, b)

print(f"a -> b: {a} -> {b}")
for i, d in enumerate(delta):
    if d[0] == ' ':  # no difference
        continue
    elif d[0] == '-':
        print(f"Deleted '{d[-1]}' from position {i}")
    elif d[0] == '+':
        print(f"Added '{d[-1]}' to position {i-1}") 

gives

a -> b: string1 -> string 2
Deleted '1' from position 6
Added ' ' to position 6
Added '2' to position 7
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • Thanka you MrFuppes, looks like what I am after. Perfect, kudos – tony2019 Sep 25 '19 at 10:38
  • glad I could help! Note though that 1) `zip(s0, s1)` only runs over the *shortest* sequence, and 2) make sure the `return` value in case of all equal elements is according to your needs. – FObersteiner Sep 25 '19 at 10:42
0

If you're assuming that both files should be exactly identical, you can just iterate over the characters of the first and compare them to the second. I.e.

# check that they're the same length first
if len(Old_file) != len(New_file):
    print('not the same!')
else:
    for indx, char in enumerate(Old_file):
        try:
            # actually compare the characters
            old_char = char
            new_char = New_file[indx]
            assert(old_char == new_char)
        except IndexError:
            # the new file is shorter than the old file
            print('not the same!')
            break  # kill the loop
        except AssertionError:
            # the characters do not match
            print('not the same!')
            break  # kill the loop

It's worth noting that there are faster ways to do this. You could look into performing a checksum, though it wouldn't tell you which parts are different only that they are different. If the files are large, the performance of doing the check one character at a time will be quite bad -- in that case you can try instead to compare blocks of data at a time.

Edit: re-reading your original question, you could definitely do this with a while loop. If you did, I would suggest basically the same strategy of checking each individual character. In that case you would manually need to increment the indx of course.

Andrew F
  • 2,690
  • 1
  • 14
  • 25
  • 1
    I think @tony2019 wants the program to return the characters that differ between the files...maybe a job for regex... – 3NiGMa Sep 25 '19 at 09:58