2

I have a list of IDs that I need to pass into an API.

Succesfully, I made the IDs into a url string, and I have a list of ~300k urls(~300K IDs)

I want to get the text part of each api call back and in alist.

I can do this by taking every ID and passing it into the URL using a for loop like so without iterating through a list:

L = [1,2,3]

    for i in L:
        #print (row)
        url = 'url&Id={}'.format(i)
        xml_data1 = requests.get(url).text
        lst.append(xml_data1)
        time.sleep(1)
        print(xml_data1)

I have been trying to use concurrent.futures and urllib.request and library to send multiple requests at once but I keep getting error:

username=xxxx&password=xxxx&Id=1' generated an exception: 'HTTPResponse' object has no attribute 'readall'

using this code:

lst = [url.com,url2.com]

URLS = lst

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result() 
            # do json processing here
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

How can I adapt the for loop I have or the code above to make multiple API calls at once?

I am asking because my connection keeps getting reset with the for loop, and I dont know how to continue where I left off in terms of either ID or url.

Using python3.6

Edit:

I applied the code from here Python requests with multithreading

where lst is the list of urls.

class Test:
    def __init__(self):
        self.urls = lst

    def exception(self, request, exception):
        print ("Problem: {}: {}".format(request.url, exception))

    def async(self):
        results = grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=5)
        print (results)

test = Test()
test.async()

The code seems to be working no error message given, but how do I append from within the code the response.text into a list?

RustyShackleford
  • 3,462
  • 9
  • 40
  • 81

1 Answers1

1

grequests as suggested here: Python requests with multithreading

It doesn't directly adapt the code you already have and you will likely have to re-write with a different lib, however it sounds much more suitable for your needs.

Further to our comminication. Please see the below code which illustrates what to change.

import grequests
lst = ['https://www.google.com', 'https://www.google.cz']
class Test:
    def __init__(self):
        self.urls = lst

    def exception(self, request, exception):
        print ("Problem: {}: {}".format(request.url, exception))

    def async(self):
        return grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=5)


    def collate_responses(self, results):
        return [x.text for x in results]
test = Test()
#here we collect the results returned by the async function
results = test.async()
response_text = test.collate_responses(results)
Swift
  • 1,663
  • 1
  • 10
  • 21
  • Thanks for the response, I applied that code before. See my edit please – RustyShackleford Aug 21 '18 at 18:26
  • Not sure without an IDE in front of me. Can you show me what print(results) shows you? – Swift Aug 21 '18 at 18:36
  • Thats the thing it doesnt print anything. From what Ive read is that the code in my edit first gets all the requests, and with ~300K urls that will take some time. Let me circle back shortening my list. – RustyShackleford Aug 21 '18 at 18:43
  • Yes I believe it's mapping the responses into a tuple or something, try a test with only 3 or 4 different urls. I believe it hasn't printed anything because it is likely still generating your huge tuple of responses. Perhaps split the comprehension in separate elements and then print the request.text each time a response is received. – Swift Aug 21 '18 at 18:45
  • Just tested it with 5 urls and got a list with `[ – RustyShackleford Aug 21 '18 at 19:15
  • tried changing request.url in the code to request.text and got the same list as above – RustyShackleford Aug 21 '18 at 19:15
  • Thats a good sign. That means you are getting the requests and responses. Im at a pc now so gimme 2 secs :) – Swift Aug 21 '18 at 19:19
  • another question is, from `results` variable how do I append every request.text into a list from inside the function? – RustyShackleford Aug 21 '18 at 19:26
  • @RustyShackleford amended the answer to include a new function called collate_responses but should probably be name collect_response_text or something – Swift Aug 21 '18 at 19:34
  • You sir, are a genius. Thank you so much! – RustyShackleford Aug 21 '18 at 19:39
  • Many thanks, glad to have helped. I self-taught all my Python knowledge, if it helps, dedication and perserverance go a long way in the programming world. – Swift Aug 21 '18 at 19:41
  • I am working on both those things. Thank you very much – RustyShackleford Aug 21 '18 at 19:42