I have a set of about 200,000 IP Addresses and 10,000 subnets of the form(1.1.1.1/24). For every IP Address I need to check whether it belongs to one of these subnets, but since it is a such a large dataset and I have less computational power, I would like an efficient implementation for this.
On searching, one method I found was this (https://stackoverflow.com/a/820124/7995937):
from netaddr import IPNetwork, IPAddress
if IPAddress("192.168.0.1") in IPNetwork("192.168.0.0/24"):
print "Yay!"
But since I have to loop this over 200,000 IP Addresses, and for each address loop over 10,000 subnets, I am unsure if this is efficient. My first doubt, is checking "IPAddress() in IPNetwork()" just a linear scan or is it optimized in some way?
The other solution I came up with was to make a list with all the IPs contained in the IP Subnets(which comes to about 13,000,000 IPs without duplicates), and then sorting it. If I do this, then in my loop over the 200,000 IP Addresses I only need to do a binary search for each IP, over a larger set of IP Addresses.
for ipMasked in ipsubnets: # Here ipsubnets is the list of all subnets
setUnmaskedIPs = [str(ip) for ip in IPNetwork(ipMasked)]
ip_list = ip_list + setUnmaskedIPs
ip_list = list(set(ip_list)) # To eliminate duplicates
ip_list.sort()
I could then just perform binary search in the following manner:
for ip in myIPList: # myIPList is the list of 200,000 IPs
if bin_search(ip,ip_list):
print('The ip is present')
Is this method more efficient than the other one? Or is there any other more efficient way to perform this task?