1

I have to import csv data from the URL which is giving me the data in the chunks of stream to a mongoDB server. I have tried following way to import the data in python :

response = urllib2.urlopen(url)
cr = csv.DictReader(response)
if seperator is not "":
    cr = csv.DictReader(response,delimiter=seperator, quoting=csv.QUOTE_ALL)
cr.next()
ct = 0
#1 make the data objects

rows = list(cr)
totalrows = len(rows)
for i,row in enumerate(rows):
    # I am creating the mongo documents here 
# once the documents are ready insert them into respective mongo collections

since I have a large number of urls and each url having almost 200 MB data so I tried multiprocessing to do so but still my script is taking too long to execute( takes 5-7 hours to import 0.6million with only 5-10% CPU usage and ~70% memory usage). My server is 4core CPU with 8GB RAM. Please suggest me to achieve the best performance with python.

Adesh Pandey
  • 769
  • 1
  • 9
  • 22
  • What is the transfer speed you get from the server? And how many URLs are you doing at once? – zxq9 Apr 10 '15 at 12:23
  • sorry! @zxq9 but I donno how to get the transfer speed from the server, if you tell me the way I will post that too. – Adesh Pandey Apr 10 '15 at 12:28
  • 1
    If you just download a URL from the server directly, like using wget, what is the average speed? The direction I am headed with this is to find a way to batch the procedure in larger chunks so that you can prevent the network or procedure from blocking one another. It is unlikely that the speed of the Python code is the issue, and its also probably not that the server is slow (though the network may be, which is part of the reason for finding out the transfer speed). – zxq9 Apr 10 '15 at 12:30
  • Is this post any useful ? http://stackoverflow.com/questions/3490173/how-can-i-speed-up-fetching-pages-with-urllib2-in-python – Alex Apr 10 '15 at 14:08
  • @JacodeGroot after reading this post I guess the problem is not my python itself but the problem is mongoDB and the third party website where I am getting the csv data – Adesh Pandey Apr 10 '15 at 14:33

0 Answers0