python file based program taking time to execute

Question

I have created below Script processing two files based on user input and generating third result file.

Scripts executes properly without any issue but when both file having high count then it is taking time. During my testing i have tested with InputFile-1 having 500000 records and InputFile-2 having 100 records.

So wanted to check if there is any possibility of optimization reducing overall execution time. Kindly share your thoughts.!

Thanks in advance.

import ipaddress
filePathName1 = raw_input('InputFile-1 : ')
filePathName2 = raw_input('InputFile-2: ')

ipLookupResultFileName = filePathName1 + ' - ResultFile.txt'
ipLookupResultFile = open(ipLookupResultFileName,'w+')

with open(filePathName1,'r') as ipFile:
    with open(filePathName2,'r') as ipCidrRangeFile:
        for everyIP in ipFile:
            ipLookupFlag = 'NONE'
            ipCidrRangeFile.seek(0)
            for everyIpCidrRange in ipCidrRangeFile:
                if (ipaddress.IPv4Address(unicode(everyIP.rstrip('\n'))) in ipaddress.ip_network(unicode(everyIpCidrRange.rstrip('\n')))) == True:
                    ipLookupFlag = 'True'
                    break
            if ipLookupFlag == 'True':
                ipLookupResultFile.write(everyIP.rstrip('\n') + ' - Valid_Operator_IP' + '\n')
            else:
                ipLookupResultFile.write(everyIP.rstrip('\n') + ' - Not_Valid_Operator_IP' + '\n')

ipFile.close()
ipCidrRangeFile.close()
ipLookupResultFile.close()

Sample records for InputFile-1: 192.169.0.1 192.169.0.6 192.169.0.7

Sample records for InputFile-2:

192.169.0.1/32
192.169.0.6/16
255.255.255.0/32
255.255.255.0/16
192.169.0.7/32

Sample records for ResultFile.txt:

192.169.0.1 - Not_Valid_Operator_IP
192.169.0.6 - Valid_Operator_IP
192.169.0.7 - Not_Valid_Operator_IP

Check out http://stackoverflow.com/questions/582336/how-can-you-profile-a-script/582337#582337 — André Laszlo, Mar 08 '17 at 09:07
Could you edit the question to give small sample files, and the expected `ResultFile.txt` output? — Martin Evans, Mar 08 '17 at 11:07

score 0 · Answer 1 · answered Mar 08 '17 at 09:57

The starting point is to focus on is that for every line in ipFile you re-read ipCidrRangeFile. Instead read the ipCidrRangeFile into a list or some other collection and read from there in the loop.

with open(filePathName2,'r') as ipCidrRangeFile:
    ipCidrRangeList = ipCidrRangeFile.readlines()

with open(filePathName1,'r') as ipFile:
    with open(filePathName2,'r') as ipCidrRangeFile:
        for everyIP in ipCidrRangeList :
            ipLookupFlag = 'NONE'
            ipCidrRangeFile.seek(0)
            for everyIpCidrRange in ipCidrRangeFile:
                if (ipaddress.IPv4Address(unicode(everyIP.rstrip('\n'))) in ipaddress.ip_network(unicode(everyIpCidrRange.rstrip('\n')))) == True:
                    ipLookupFlag = 'True'
                    break
            if ipLookupFlag == 'True':
                ipLookupResultFile.write(everyIP.rstrip('\n') + ' - Valid_Operator_IP' + '\n')
            else:
                ipLookupResultFile.write(everyIP.rstrip('\n') + ' - Not_Valid_Operator_IP' + '\n')

Thanks @Heroworkshop, have implemented your suggestion and execution time improved by 6 minutes. — AJNEO999, Mar 08 '17 at 11:15

score 0 · Accepted Answer · answered Mar 08 '17 at 11:43

A better approach is to load and process each file once, and then use this data to do the processing:

filePathName1 = raw_input('InputFile-1 : ')
filePathName2 = raw_input('InputFile-2: ')

ipLookupResultFileName = filePathName1 + ' - ResultFile2.txt'

with open(filePathName1) as ipFile:
    ip_addresses = [unicode(ip.strip()) for ip in ipFile]

with open(filePathName2) as ipCidrRangeFile:  
    ip_cidr_ranges = [unicode(cidr.strip()) for cidr in ipCidrRangeFile]

with open(ipLookupResultFileName,'w+') as ipLookupResultFile:
    for ip_address in ip_addresses:
        ipLookupFlag = False
        for cidr_range in ip_cidr_ranges:
            if ipaddress.IPv4Address(ip_address) in ipaddress.ip_network(cidr_range):
                ipLookupFlag = True
                break

        if ipLookupFlag:
            ipLookupResultFile.write("{} - Valid_Operator_IP\n".format(ip_address))
        else:
            ipLookupResultFile.write("{} - Not_Valid_Operator_IP\n".format(ip_address))

Note, using with() means you do not need to explicitly close the file afterwards.

Depending on your needs, a further speed improvement could be made by removing any duplicate ip_addresses. This could be done by loading the data into a set() rather than a list, for example:

ip_addresses = set(unicode(ip.strip()) for ip in ipFile)

You could also sort your results by IP address before writing them to a file.

Thanks for response!! If i use set() then script exection time is 7minutes but it removes all duplicate records which is not what i need... So, i opted for above solution due to which improvement of 7 second found. — AJNEO999, Mar 08 '17 at 13:21

python file based program taking time to execute

2 Answers2