I have to import csv data from the URL which is giving me the data in the chunks of stream to a mongoDB server. I have tried following way to import the data in python :
response = urllib2.urlopen(url)
cr = csv.DictReader(response)
if seperator is not "":
cr = csv.DictReader(response,delimiter=seperator, quoting=csv.QUOTE_ALL)
cr.next()
ct = 0
#1 make the data objects
rows = list(cr)
totalrows = len(rows)
for i,row in enumerate(rows):
# I am creating the mongo documents here
# once the documents are ready insert them into respective mongo collections
since I have a large number of urls and each url having almost 200 MB data so I tried multiprocessing to do so but still my script is taking too long to execute( takes 5-7 hours to import 0.6million with only 5-10% CPU usage and ~70% memory usage). My server is 4core CPU with 8GB RAM. Please suggest me to achieve the best performance with python.