Python removing duplicate lines

Question

I have written the following code to get all of the IP-Addresses out of a file and print them:

with open("C:\\users\\joey\\desktop\\access.log",'r') as bestand:
    for line in bestand:
        try:
            splittedline = line.split('sftp-session')[1].split("[")[1].split("]")[0]
        except Exception:
            continue
        print splittedline

The following code prints all of the IP-Addresses of another file:

with open("C:\\users\\joey\\desktop\\exit_nodes.csv",'r') as bestand:
    for line in bestand:
        print line

How can I compare the 2 files and only show unique IP-Addresses and remove the duplicates?

The output atm is like:

217.172.190.19
217.210.165.43
218.250.241.229
223.18.115.229
223.133.243.101

In python, to remove duplicates, you normally cast to a `set()`. — Cyrbil, Nov 25 '15 at 12:42

score 2 · Accepted Answer · answered Nov 25 '15 at 12:48

If the order is not important, use a set:

ips_1 = set()

with open("C:\\users\\joey\\desktop\\access.log",'r') as bestand:
    for line in bestand:
        try:
            ips1.add(linprint splittedlinee.split('sftp-session')[1].split("[")[1].split("]")[0])
        except Exception:
            continue

ips_2 = set()
with open("C:\\users\\joey\\desktop\\exit_nodes.csv",'r') as bestand:
    for line in bestand:
        ips_2.add(line)

You can then use the set methods to look which ips are in both files, which are only on one file or to get all unique ips:

Which ips are in both files?

ips_1.intersection(ips_2)

Which ips are only in file 1?

ips_1.difference(ips_2)

All unique ips:

ips_1.union(ips_2)

Python removing duplicate lines

1 Answers1