0

I'm little confused about requests module, especially proxies.

From documentation:

PROXIES

Dictionary mapping protocol to the URL of the proxy (e.g. {‘http’: ‘foo.bar:3128’}) to be used on each Request.

May there be more proxies of one type in the dictionary? I mean is it possible to put there list of proxies and requests module will try them and use only those which are working?

Or there can be only one proxy address for example for http?

Milano
  • 18,048
  • 37
  • 153
  • 353
  • 1
    seen [this SO QA](http://stackoverflow.com/questions/18369598/python-how-to-use-requests-library-to-access-a-url-through-several-different-pr) yet? From it, it sounds like it's possible to have multiple proxies for a single protocol. Try! – Pynchia Jul 31 '15 at 19:40
  • Seems only one proxy is valid for one protocol. You may have to check the availability of proxies by yourself. – czheo Jul 31 '15 at 19:54
  • 1
    proxies in requests are very crappy imho ... especially if the user is behind a proxyconfigfile and other nonsense – Joran Beasley Jul 31 '15 at 19:59
  • OK, I have tried myself. Obviously it works because multiple keys in the dictionary are overwritten and only the last entry is considered. – Pynchia Jul 31 '15 at 20:04
  • @Pynchia So it doesn't work, does it? I was thinking about something like 'http':[ip,ip,ip..] – Milano Jul 31 '15 at 20:08
  • please see my answer below – Pynchia Jul 31 '15 at 20:17

2 Answers2

3

Using the proxies parameter is limited by the very nature of a python dictionary (i.e. each key must be unique).

import requests

url = 'http://google.com'
proxies = {'https': '84.22.41.1:3128',
           'http': '185.26.183.14:80',
           'http': '178.33.230.114:3128'}

if __name__ == '__main__':
    print url
    print proxies
    response = requests.get(url, proxies=proxies)
    if response.status_code == 200:
        print response.text
    else:
        print 'Response ERROR', response.status_code

outputs

http://google.com
{'http': '178.33.230.114:3128', 'https': '84.22.41.1:3128'}
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for."
...more html...

As you can see, the value of the http protocol key in the proxies dictionary corresponds to the last encountered in its assignment (i.e. 178.33.230.114:3128). Try swapping the http entries around.

So, the answer is no, you cannot specify multiple proxies for the same protocol using a simple dictionary.

I have tried using an iterable as a value, which would make sense to me

proxies = {'https': '84.22.41.1:3128',
           'http': ('178.33.230.114:3128', '185.26.183.14:80', )}

but with no luck, it produces an error

Pynchia
  • 10,996
  • 5
  • 34
  • 43
  • Thank you for your answer. So I've tried many proxies but it seems that there is some problem. r = requests.get('https://wtfismyip.com/text', proxies={'http':'52.10.202.111:8080'}) returns my ip address instead of this proxy which is Anonymous proxy. – Milano Jul 31 '15 at 20:55
  • you have a typo in the `get` call (`;`). I have tried with the proxies in my answer a few minutes ago and they work fine. I got the proxies from [this site](http://www.cool-proxy.net/proxies/http_proxy_list/sort:score/direction:desc). Does my answer satisfy your question? – Pynchia Jul 31 '15 at 20:58
  • Oh, I'm sorry. I didn't noticed that wtfismyip is a https instead of http. Now, everything works correctly :) – Milano Jul 31 '15 at 21:01
  • And yes, your answer satisfied my question. Thank you – Milano Jul 31 '15 at 21:01
  • Thank you, I have learnt something new today, thanks to your question. :) – Pynchia Jul 31 '15 at 21:02
1

Well, actually you can, I've done this with a few lines of code and it works pretty well.

import requests


class Client:

    def __init__(self):
        self._session = requests.Session()
        self.proxies = None

    def set_proxy_pool(self, proxies, auth=None, https=True):
        """Randomly choose a proxy for every GET/POST request        
        :param proxies: list of proxies, like ["ip1:port1", "ip2:port2"]
        :param auth: if proxy needs auth
        :param https: default is True, pass False if you don't need https proxy
        """
        from random import choice

        if https:
            self.proxies = [{'http': p, 'https': p} for p in proxies]
        else:
            self.proxies = [{'http': p} for p in proxies]

        def get_with_random_proxy(url, **kwargs):
            proxy = choice(self.proxies)
            kwargs['proxies'] = proxy
            if auth:
                kwargs['auth'] = auth
            return self._session.original_get(url, **kwargs)

        def post_with_random_proxy(url, *args, **kwargs):
            proxy = choice(self.proxies)
            kwargs['proxies'] = proxy
            if auth:
                kwargs['auth'] = auth
            return self._session.original_post(url, *args, **kwargs)

        self._session.original_get = self._session.get
        self._session.get = get_with_random_proxy
        self._session.original_post = self._session.post
        self._session.post = post_with_random_proxy

    def remove_proxy_pool(self):
        self.proxies = None
        self._session.get = self._session.original_get
        self._session.post = self._session.original_post
        del self._session.original_get
        del self._session.original_post

    # You can define whatever operations using self._session

I use it like this:

client = Client()
client.set_proxy_pool(['112.25.41.136', '180.97.29.57'])

It's simple, but actually works for me.

laike9m
  • 18,344
  • 20
  • 107
  • 140