1

I am trying to do a reverse lookup for all the internal IP addresses, to validate the inventory that I have. I am looking to do this via Python. I am thinking of generating a csv file with all the internal IP addresses using the following code-

import ipaddress as ip
import pandas as pd

file_name='10Dot.csv'

a = ip.ip_network('10.0.0.0/8')
ip_list = []
for x in a.hosts():
    ip_list.append(x.compressed)

df=pd.DataFrame({'IP_Address':ip_list})
df.to_csv(file_name, encoding='utf-8', index=False)

end = time.time()
print(end - start)

Similarly, I want to generate files for other internal networks. Then using the following function I am trying to go through each of the lines in the generated file to do a reverse lookup-

def reverse_lookup(host):
    try:
        lookup=socket.gethostbyaddr(str(host))[0]
    except:
        lookup="NA"
    return lookup    

If I read the csv file line by line it is very slow to get through all the IP addresses. I am trying to use multi-threads to pick chunks of the CSV file and execute the above function line by line. So with the 10.0.0.0/8 network, I have 16,777,214 rows in the file, I am thinking of diving this in 100 parts and generate a final file with host and the looked up value. How do I read the csv file in parallel for the threads and then combine them into a single file?

Also if you have a better approach to solving this problem please do let me know.

Steve_Greenwood
  • 546
  • 8
  • 20
  • You could've found it easily https://stackoverflow.com/questions/8424771/parallel-processing-of-a-large-csv-file-in-python – ravi Jun 06 '18 at 04:01
  • Be careful with the DNS bandwidth, you can DOS your local resolver if you have too many parallel threads. I would look into aiodns if you want to optimize the wall clock time. – tripleee Jun 06 '18 at 04:08
  • @tripleee - That's a good point about DOS, since aiodns does the processing asynchronously how would the program work if multiple threads are calling the same function. – Steve_Greenwood Jun 06 '18 at 04:29
  • You don't need threads really if you use async. Threads can call the same function in parallel just fine, though you need to think about the integrity of shared data structures between threads. How exactly you manage the parallelism is unimportant as far as DNS is concerned anyway. – tripleee Jun 06 '18 at 04:39
  • As a benchmark, I don't think 100 parallel requests are going to be a problem. I once managed to DOS the company DNS with uncontrolled parallelism (this was with multiprocessing, basically simply `xargs -P10000 -n 1 dig – tripleee Jun 06 '18 at 04:41

0 Answers0