So the simplest solution to get all the strings not in sample.txt
is to use set difference:
file_1 = set()
file_2 = set()
with open('Resp.txt', 'r') as f:
for line in f:
file_1.add(line.strip())
with open('Sample.txt', 'r') as f:
for line in f:
file_2.add(line.strip())
print(file_1 - file_2)
Which returns:
{'export route-policy ABCDE', 'vrf XXX', 'spanning tree enable', '!', '212:43', 'bandwidth 10', 'maximum prefix 12 34', '9:43', '123:45'}
However, this doesn't include certain rules applied to Resp.txt
, for example:
- If line is "maximum prefix" ignore the numbers.
These rules can be applied while reading Resp.txt
:
import re
file_1 = set()
file_2 = set()
with open('Resp.txt', 'r') as f:
for line in f:
line = line.strip()
if line == "!":
continue
elif re.match( r'\d+:\d+', line): # Matches times.
continue
elif line.startswith("vrf"):
line = "vrf"
elif line.startswith("maximum prefix"):
line = "maximum prefix"
file_1.add(line)
with open('Sample.txt', 'r') as f:
for line in f:
file_2.add(line.strip())
print(file_1) - file_2)
Which returns:
{'export route-policy ABCDE', 'bandwidth 10', 'spanning tree enable'}
Which is correct because sample.txt
does not contain route-policy
.
These rules could be made more robust, but they should be illustrative enough.
Keep in mind set
will only find unique differences, and not all (say you have multiple 'spanning tree enable'
lines and would like to know how many times these are seen. In that case, you could do something more in line with your original code:
import re
file_1 = []
file_2 = []
with open('Resp.txt', 'r') as f:
for line in f:
line = line.strip()
if line == "!":
continue
elif re.match( r'\d+:\d+', line):
continue
elif line.startswith("vrf"):
line = "vrf"
elif line.startswith("maximum prefix"):
line = "maximum prefix"
file_1.append(line)
with open('Sample.txt', 'r') as f:
for line in f:
file_2.append(line.strip())
diff = []
for line in file_1:
if line not in file_2:
diff.append(line)
print(diff)
Result:
['export route-policy ABCDE', 'spanning tree enable', 'bandwidth 10']
While this method is slower (although you probably won't notice), it can find duplicate lines and maintains the order of the lines found.