1

I'm using requests module in my script, and I want to understand the proxies parameter in the get() method. This answer has posted the following code to illustrate the usage of proxies parameter:

http_proxy  = "10.10.1.10:3128"
https_proxy = "10.10.1.11:1080"
ftp_proxy   = "10.10.1.10:3128"

proxyDict = {"http":http_proxy,  "https":https_proxy, "ftp":ftp_proxy }

r = requests.get(url, headers=headers, proxies=proxyDict)

Here are my questions:

  1. Why are we passing more then one proxy to get()? How does get() use them? Does it try one by one?

  2. Given a proxy say, a.b.c.d:port, how would I know its protocol type? When you buy premium proxies from hidemyass.com, it sends proxies in ip:port format only and doesn't send the protocol type. So what should I pass to requests.get() method?

I've these doubts because I don't know much about proxies in general and how they work. So it would be great if somebody explains this as well.

Community
  • 1
  • 1
Nawaz
  • 353,942
  • 115
  • 666
  • 851

1 Answers1

5
  1. .get() uses the proxy whose key in the dictionary matches the scheme of the URL. That is, if you access 'http://www.google.com/', the proxy whose key is 'http' (in your example, http_proxy) will be used. If you access 'https://www.google.com/', the proxy whose key is 'https' (in your example, https_proxy) will be used.

  2. The short answer is that any paid proxy should accept both HTTP and HTTPS URLs.

    In practice, this is made complicated by Requests, which does two unexpected things. Firstly, if you use proxy addresses in the form you've provided in your question (i.e. ip:port), Requests will assume the protocol used to access the proxy is the same as the protocol you're proxying. That is, http_proxy will be internally converted to "http://10.10.1.10:3128", and https_proxy to "https://10.10.1.11:1080". This is usually not what you want, so you should always be explicit and use the form scheme://ip:port.

    The second thing is that Requests currently has real problems with HTTPS through proxies. In general you should assume that they don't work, though it's actually a bit more complex than that.

    Both of these problems are likely to be addressed in the planned V2.0 release.

I've written a blog post about proxies in Requests, if you'd like to know more.

As for how proxies work, their purpose is to accept HTTP requests and forward them on to their destination. Usually they are used for one of two reasons: either to mutate HTTP requests (and potentially drop them entirely), or to cache HTTP requests/responses. Wikipedia has an excellent article to get you started.

Lukasa
  • 14,599
  • 4
  • 32
  • 34
  • 1
    Note: you probably want to use the *same* `http://` url for both `http` and `https`. `https://` proxy urls are most certainly wrong. (`curl` for example totally ingores the scheme and uses `http://`, at least for `http` and `https`) – t-8ch Jul 22 '13 at 13:47