1

I am trying to make a number of GET requests (between 1000 and 2000) to an API. So far it takes more than 5 mins and the MySQL server just closes my connection.

I am trying to make it in less than a minute. It should be possible ?

Here is what I have so far:

def get_data(devices):

        for dev in devices: #array containing around 1000 devices
            model = str(dev[0])
            brand = str(dev[1])    
            model = model.replace(" ", "%20")
            brand = brand.replace(" ","%20")

            os = urllib2.urlopen('https://api.com/getData?&brand=' + brand + '&model='+ model).read()
            xmldoc = minidom.parseString(os)

            for element in xmldoc.getElementsByTagName('name'):
                print (element.firstChild.nodeValue)
Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
user4554133
  • 75
  • 1
  • 9
  • Maybe look into multithreading. Here is a good starting place: http://stackoverflow.com/questions/2846653/python-multithreading-for-dummies/28463266#28463266 – Ryan Apr 26 '15 at 15:15
  • @Ryan it won't help - it's the server that's choking. If you want to improve the server performance there are many ways to do it - start by looking into caching. If the data is not big (a.k.a can be help in-memory) then there is no reason to send a query to the DB upon every request! (or at all...) – Nir Alfasi Apr 26 '15 at 15:18
  • increase the mysql connection timeout values? re-establish the connection when you discover it drops? bulk insert at the end rather than every result? – pala_ Apr 26 '15 at 15:19
  • Modify the server so it takes more than one ID per request and replies with more than one dataset per response. – Tomalak Apr 26 '15 at 16:50
  • httpS costs something. xml costs something. – Rick James Apr 26 '15 at 16:52
  • 1
    %20 is not the only thing that should be handled when building a url. – Rick James Apr 26 '15 at 16:52

1 Answers1

0

The bottleneck of you code maybe the network I/O, if you want to do that faster, you may try to use gevent lib.

I used that to do a lot of traceroute of IP addresses, it's faster than multithreading. Here is some demo code to help you start.

lqhcpsgbl
  • 3,694
  • 3
  • 21
  • 30