0

I need an explanation on what requests does with a proxyDict, specifically the following:

1. Does it evenly cycle through all of the proxies in the dictionary?

2. What happens if one of them goes down, will requests be able to handle it, or do I have to?

3. What happens if one gets "banned", will it handle it?

4. If I make a get call in a function, will it still cycle through the proxies evenly?

So if I have a dictionary of proxies like so:

proxyDict = { 
    'https' : 'https://IP1:PORT', 
    'https' : 'https://IP2:PORT', 
    'https' : 'https://IP3:PORT',
    'https' : 'https://IP4:PORT'
}

And I have a get request:

s = requests.Session()
data = {"Username":"user", "Password":"pass"}
s.get(download_url, proxies = proxyDict, verify=False)

Which might be in a function, similarly to this (my question #4):

 def foo(download_url, proxyDict, s):
    s.get(download_url, proxies = proxyDict, verify=False)

Also is there any way to print which proxy is currently in use?

SPYBUG96
  • 1,089
  • 5
  • 20
  • 38
  • That dictionary obviosuly won't work, you are assigning multiple values to the same key in different lines. – Anshul Goyal Dec 06 '17 at 16:42
  • @mu無 oops, its just an example used to give context to my question – SPYBUG96 Dec 06 '17 at 16:44
  • Hmm I see. I was in the middle of writing an answer, but now this seems like a duplicate of https://stackoverflow.com/q/8287628/1860929 – Anshul Goyal Dec 06 '17 at 16:45
  • @mu無 I read the article, but that does not answer any of my specific questions – SPYBUG96 Dec 06 '17 at 16:47
  • It really seems as if a few simple tests would answers most of your questions. – larsks Dec 06 '17 at 17:21
  • @larsks I've been testing as best I can. I've been reading the docs on requests and there isn't anything on how to show the last proxy used. So I can't test if the proxies cycle. Also I don't have any banned IP's, and I don't have the power to "turn off" one of our proxies – SPYBUG96 Dec 06 '17 at 17:24
  • @larsks I just thought of a better question. "How do you tell which proxy was used last with python requests?" – SPYBUG96 Dec 06 '17 at 17:26

2 Answers2

1

I think you will find that the keys in your proxyDict are supposed to be protocols (like http or https), and that requests will simply ignore your proxies with keys like http1, etc.

If you enable DEBUG logging, you can see what proxy requests is using. Consider this initial request without proxies:

>>> import logging
>>> logging.basicConfig(level='DEBUG')
>>> requests.get('http://google.com')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): google.com
DEBUG:urllib3.connectionpool:http://google.com:80 "GET / HTTP/1.1" 301 219
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.google.com
DEBUG:urllib3.connectionpool:http://www.google.com:80 "GET / HTTP/1.1" 200 4796
<Response [200]>

Now, let's set up a proxy dictionary:

>>> proxyDict={'http': 'http://squid.corp.example.com:3128'}

And re-issue the request using that dictionary:

>>> requests.get('http://google.com', proxies=proxyDict)
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): squid.corp.example.com
DEBUG:urllib3.connectionpool:http://squid.corp.example.com:3128 "GET http://google.com/ HTTP/1.1" 301 219
DEBUG:urllib3.connectionpool:http://squid.corp.example.com:3128 "GET http://www.google.com/ HTTP/1.1" 200 4768
<Response [200]>

You can see in the DEBUG messages that it is using the proxy rather than making a direct connection. Now if we use your proxy dictionary and make the same request...

>>> proxyDict = { 
...     'https1' : 'https://IP1:PORT', 
...     'https2' : 'https://IP2:PORT', 
...     'https3' : 'https://IP3:PORT',
...     'https4' : 'https://IP4:PORT'
... }
>>> requests.get('http://google.com', proxies=proxyDict)
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): google.com
DEBUG:urllib3.connectionpool:http://google.com:80 "GET / HTTP/1.1" 301 219
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): www.google.com
DEBUG:urllib3.connectionpool:http://www.google.com:80 "GET / HTTP/1.1" 200 4790
<Response [200]>

...you can see that it doesn't use any proxies.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • So weird, I guess my program wasn't using a proxy when I set up the proxyDict, which leads to the question, "Why do so many other questions related to proxies and requests use a proxyDict in their answer?" I should just use a list of proxies and cycle through them myself lol – SPYBUG96 Dec 06 '17 at 18:23
  • Random question, using DEBUG, what would one of the lines look like if a proxy server was used? – SPYBUG96 Dec 07 '17 at 15:27
  • You mean like in the answer here, just before "You can see in the DEBUG messages that it is using the proxy..."? – larsks Dec 07 '17 at 15:38
0

1. Does it evenly cycle through all of the proxies in the dictionary?
No it does not. proxies is a dictionary with protocols and proxies, requests uses the proxy that mathes the protocol of the request (if any).

2. What happens if one of them goes down, will requests be able to handle it, or do I have to?
If a proxy is not avaliable for some reason requests will raise an exception, you could catch it.

3. What happens if one gets "banned", will it handle it?
No, but you could detect an ip ban if you check the status code and response body.

4. If I make a get call in a function, will it still cycle through the proxies evenly?
No it won't, see 1. However you could create a list of proxies and loop over it.

An example:

def next_proxy(current):
    '''Returns the next item in proxies.'''
    if not proxies:
        return None
    if current not in proxies or current == proxies[-1]:
        return proxies[0]
    return proxies[proxies.index(current)+1]

def bad_response(response, error_message='some message'):
    '''Detects ip ban and other bad responses.'''
    return response.status_code == 403 or error_message in response.text

proxies = [
    {'https':'https://177.131.51.155:53281', 'http':'http://177.131.51.155:53281'}, 
    {'https':'https://138.197.45.196:8118', 'http':'http://138.197.45.196:8118'}, 
    {'https':'https://153.146.159.139:8080', 'http':'http://153.146.159.139:8080'}, 
]

s = requests.Session()
proxy = None
for _ in range(10):
    print("Using proxy:", proxy)
    try: 
        r = s.get("http://jsonip.com/", proxies=proxy)
        print(r.text)
        if bad_response(r):
            print("bad response")
            #proxies.remove(proxy)
    except (requests.exceptions.ProxyError, requests.exceptions.ConnectionError):
        print("proxy error")
        proxies.remove(proxy)
    proxy = next_proxy(proxy)
t.m.adam
  • 15,106
  • 3
  • 32
  • 52