2

I am trying to make concurrent API calls with python. I based my code on the solution (first answer) presented in this thread: What is the fastest way to send 100,000 HTTP requests in Python?

Currently, my code is broken. I have a main function which creates the queue, populates it, initiates the threads, starts them, and joins the queue. I also have a target function which should make the get requests to the API.

The difficulties I am experiencing right now is that the target function does not execute the necessary work. The target is called, but it acts as the queue is empty. The first print is executed ("inside scraper worker"), while the second ("inside scraper worker - queue NOT empty") is not.

def main_scraper(flights):
  print("main scraper was called, got: ")
  print(flights)
  data = []
  q = Queue()
  map(q.put, flights)
  for i in range(0,  5):
      t = Thread(target = scraper_worker, args = (q, data))
      t.daemon = True
      t.start()
  q.join()
  return data

def scraper_worker(q, data):
  print("inside scraper worker")
  while not q.empty():
    print("inside scraper worker, queue not empty")
    f = q.get()
    url = kiwi_url(f)
    response = requests.get(url)
    response_data = response.json()
    results = parseResults(response_data)
    q.task_done()
    print("task done. results:")
    print(results)
    #f._price = results[0]["price"]
    #f._url = results[0]["deep_link"]
    data.append(results)
  return data

I hope this is enough information for you to help me out. Otherwise, I will rewrite the code in order to create a code that can be run by anyone.

Rafael Marques
  • 1,335
  • 4
  • 22
  • 35

1 Answers1

2

I would guess that the flights are not being put on the queue. map(q.put, flights) is lazy, and is never accessed so it is as if it didn't happen. I would just iterate.

def main_scraper(flights):
  print("main scraper was called, got: ")
  print(flights)
  data = []
  q = Queue()
  for flight in flights:
      q.put(flight)
  for i in range(0,  5):
      t = Thread(target = scraper_worker, args = (q, data))
      t.daemon = True
      t.start()
  q.join()
  return data
Boyd Johnson
  • 342
  • 7
  • 11
  • yeah, that was the problem. I created a new post related to the map function, after i discovered that problem. What do you mean by the map is lazy? related post: https://stackoverflow.com/questions/49039676/py-queue-library-mapping-array-to-queue-py3-issues – Rafael Marques Mar 02 '18 at 00:41
  • https://stackoverflow.com/questions/40015439/why-does-map-return-a-map-object-instead-of-a-list-in-python-3 This post has some good thoughts on how maps are lazy. What I mean is they don't execute right away but they do later when they are accessed. – Boyd Johnson Mar 02 '18 at 00:59