1

I'm looking to send 100K-300K POST requests to an API Endpoint - these requests are originating from a list of JSON objects that I'm iterating over. Unfortunately, the maximum chunk size I'm able to use is 10 events at a time which is greatly decreasing the speed of sending off all the events I want. After I have my list of JSON objects defined:

chunkSize= 10
for i in xrange(0, len(list_of_JSON), chunkSize):
    chunk = list_of_JSON[i:i+chunkSize] #10
    endpoint = ""
    d_profile = "[" + ",".join(json.dumps(x) for x in chunk) + "]"
    str_event = d_profile
    try:
         url = base_api + endpoint + "?api_key=" + api_key + "&event=" + str_event 
         r = requests.post(url)
         print r.content
         print i
    except:
        print 'failed'

This process works extremely slowly to send off the events. I've looked up the possibility of multithreading/concurrency/and parallel processing although I'm completely new to the topic. After some research, I've come up with this ugly snippit:

import logging
import threading
import time

logging.basicConfig(level=logging.DEBUG,
                format='[%(levelname)s] (%(threadName)-10s) %(message)s',
                )

def worker():
    logging.debug('Starting')
    import time
    chunkSize= 10
    for i in xrange(0, (len(list_of_JSON)/2), chunkSize):
        chunk = list_of_JSON[i:i+chunkSize] #10
        endpoint = ""
        d = "[" + ",".join(json.dumps(x) for x in chunk) + "]"
        str_event = d
        try:
            url = base_api + endpoint + "?api_key=" + api_key + "&event=" + str_event 
            r = requests.post(url)
            print r.content
            print i
        except:
            print 'failed'
    time.sleep(2)
    logging.debug('Exiting')

def my_service():
    logging.debug('Starting')
    import time
    chunkSize= 10
    for i in xrange(((len(list_of_JSON)/2)+1), len(list_of_JSON), chunkSize):
        chunk = list_of_JSON[i:i+chunkSize] #10
        endpoint = ""
        d = "[" + ",".join(json.dumps(x) for x in chunk) + "]"
        str_event = d
        try:
            url = base_api + endpoint + "?api_key=" + api_key + "&event=" + str_event 
            r = requests.post(url)
            print r.content
            print i
        except:
            print 'failed'
    time.sleep(3)
    logging.debug('Exiting')

t = threading.Thread(target=my_service)
w = threading.Thread(target=worker)


w.start()
t.start()

Would appreciate any suggestions or refactoring.

Edit: I believe my implementation accomplishes what I want. I've looked over What is the fastest way to send 100,000 HTTP requests in Python? but am still unsure of how pythonic or efficient this solution is.

Community
  • 1
  • 1
  • Possible duplicate of [What is the fastest way to send 100,000 HTTP requests in Python?](http://stackoverflow.com/questions/2632520/what-is-the-fastest-way-to-send-100-000-http-requests-in-python) – Wajahat Aug 17 '16 at 20:29
  • 1
    This is pretty old, I would suggest looking into something like [eventlet](https://github.com/eventlet/eventlet). There are basic examples in the documentation that do exactly what you're asking. – Thtu Aug 17 '16 at 20:34
  • checking out eventlet, thanks – astateofsanj Aug 17 '16 at 20:55
  • Actually the first answer on that link uses threading which is still relevant. As well as `twisted` which provides async requests. `multiprocessing` can be used in the same structure as the first answer instead of `threading` as well. And just a note: as long as the request generation follows a closed loop design, the request rate will be limited by the number of concurrent threads as well as the response time of each request. Async can avoid that but it is generally non-trivial to understand. – Wajahat Aug 17 '16 at 21:39

1 Answers1

-1

You could use scrapy, which uses twisted (as suggested in the comments). Scrapy is a framework intended for scraping web pages, but you can use it to send post requests too. A spider that achieves the same as your code would look more or less like this:

class EventUploader(scrapy.Spider):
    BASE_URL = 'http://stackoverflow.com/'  # Example url

    def start_requests(self):
        for chunk in list_of_JSON:
            get_parameters = {
                'api_key': api_key,
                'event': json.dumps(chunk),  # json.dumps can encode lists too
            }

            url = "{}/endpoint?{}".format(
                self.BASE_URL, urlencode(get_parameters))
            yield scrapy.FormRequest(url, formdata={}, callback=self.next)

    def next(self, response):
        # here you can assert everything went ok
        pass

Once your spider is in place, you can use scrapy middlewares to limit your requests. You'd run your uploader like this:

scrapy runspider my_spider.py
Marco Lavagnino
  • 1,140
  • 12
  • 31