0

I have code like this:

for name in df.index.values:
    json = requests.get('some_url'+name).json()
    json['name'] = name
    a.append(json)

But to check all items, it takes a minute, how to do it async to do it faster?

What library will be better to use?

  • 1
    I have found [requests-futures](https://github.com/ross/requests-futures) the simplest method of running async requests. It's 4 to 5 lines of code to run everything and get the JSON response back. – roganjosh Sep 05 '17 at 17:46
  • @roganjosh there is a lot of names and will be a lot of requests, how to do it? – Nurislom Rakhmatullaev Sep 05 '17 at 17:49
  • So from what I understand, you're pulling the index out of the df anyway and running this externally from the df? – roganjosh Sep 05 '17 at 17:50
  • @roganjosh you are right. So any idea? – Nurislom Rakhmatullaev Sep 05 '17 at 17:51
  • Yes, I can help you with requests-futures, I need a few mins. You should note though that this is borderline "opinion-based" so a) there are other approaches and b) other approaches may be more appropriate. – roganjosh Sep 05 '17 at 17:53
  • you use follow simple python mechanisam https://docs.python.org/3/library/asyncio-task.html#asyncio.wait – Dharmesh Fumakiya Sep 05 '17 at 17:53
  • Possible duplicate of [Python multiprocess Pool vs Process](https://stackoverflow.com/questions/45844876/python-multiprocess-pool-vs-process) – andrewgu Sep 05 '17 at 18:15
  • 1
    @andrewgu I feel that's an unfair dupe target. Not only is the question there asking specifically about multiprocessing (but happens to get an async threaded answer) but it's also only a couple of weeks old... so itself is likely a dupe somewhere along the lines. – roganjosh Sep 05 '17 at 18:19
  • Maybe "unfair" is not the right word. "Misleading" perhaps. – roganjosh Sep 05 '17 at 18:20
  • @roganjosh Sorry if I didn't choose a great target, either that one or a similar recent one came to mind. I found some other, older duplicates when searching, but I didn't think their answers were as helpful as the one I included. – andrewgu Sep 05 '17 at 18:22
  • The target I used just had async requests, and waiting for completion. Others that were much closer duplicates didn't have those similarities. – andrewgu Sep 05 '17 at 18:24
  • @andrewgu I also happen to have commented on the answer for your target and had it bundled in as an edit :) It's a good, comprehensive answer, I just don't know if it makes sense as a dupe. for the reviewers to decide. – roganjosh Sep 05 '17 at 18:24
  • @roganjosh Thanks for the feedback. I'll take it into account in the future. Should I comment with a helpful reference link then mark as duplicate with a better "duplicate" question? – andrewgu Sep 05 '17 at 18:27
  • 1
    @andrewgu I'd just leave it as it is tbh, but it's completely up to you. There's no harm in suggesting it as a dupe, that just lets others decide if they agree with you or not. Either way, having the link against this question certainly serves to give further reading. My reservations are not about the content of the answer, only the topic of the question. It really isn't a big deal either way :) – roganjosh Sep 05 '17 at 18:30

1 Answers1

0

There are several approaches for making asynchronous requests from python. One simple way is to use the requests-futures library.

from requests_futures.sessions import FuturesSession

session = FuturesSession(max_workers=10)
names = df.index.values.tolist()
urls = ['some_url'+name for name in names]
fire_requests = [session.get(url) for url in urls]
results = [item.result() for item in fire_requests]

You may have to call .json() on item.result(). I've always used this approach though it strikes me now that it makes use of list-comprehension side effects so it may be better practice to break it down into a standard for loop in the case of fire_requests.

roganjosh
  • 12,594
  • 4
  • 29
  • 46