Proxies with Python 'Requests' module

Question

Just a short, simple one about the excellent Requests module for Python.

I can't seem to find in the documentation what the variable 'proxies' should contain. When I send it a dict with a standard "IP:PORT" value it rejected it asking for 2 values. So, I guess (because this doesn't seem to be covered in the docs) that the first value is the ip and the second the port?

The docs mention this only:

proxies – (optional) Dictionary mapping protocol to the URL of the proxy.

So I tried this... what should I be doing?

proxy = { ip: port}

and should I convert these to some type before putting them in the dict?

r = requests.get(url,headers=headers,proxies=proxy)

score 435 · Accepted Answer · edited May 31 '22 at 15:09

435

The proxies' dict syntax is {"protocol": "scheme://ip:port", ...}. With it you can specify different (or the same) proxie(s) for requests using http, https, and ftp protocols:

http_proxy  = "http://10.10.1.10:3128"
https_proxy = "https://10.10.1.11:1080"
ftp_proxy   = "ftp://10.10.1.10:3128"

proxies = { 
              "http"  : http_proxy, 
              "https" : https_proxy, 
              "ftp"   : ftp_proxy
            }

r = requests.get(url, headers=headers, proxies=proxies)

Deduced from the requests documentation:

Parameters:
method – method for the new Request object.
url – URL for the new Request object.
...
proxies – (optional) Dictionary mapping protocol to the URL of the proxy.
...

On linux you can also do this via the HTTP_PROXY, HTTPS_PROXY, and FTP_PROXY environment variables:

export HTTP_PROXY=10.10.1.10:3128
export HTTPS_PROXY=10.10.1.11:1080
export FTP_PROXY=10.10.1.10:3128

On Windows:

set http_proxy=10.10.1.10:3128
set https_proxy=10.10.1.11:1080
set ftp_proxy=10.10.1.10:3128

edited May 31 '22 at 15:09

Artemis

2,553
7
21
36

answered Nov 27 '11 at 18:08

chown

51,908
16
134
170

@cigar I knew because urllib2 uses the exact same format for their proxies dict, and when I saw http://docs.python-requests.org/en/latest/api/#module-requests say "proxies – (optional) Dictionary mapping protocol to the URL of the proxy.", I knew right away. – chown Nov 27 '11 at 18:12
1

ahhh i see, never used proxies with urllib2 because of the advice to get rid of it obtained from here, replaced 2 pages of code with 8 lines :/ re:shoulder :))) great stay here, you have already saved me hours in total! if you ever need any help with music gimme a shout, that i can give advice on, otherwise cant think of way to repay other than massive thanks or cups of tea! – Nov 27 '11 at 18:17
It seems requests and moreover urllib3 can't do a CONNECT when using a proxy :( – dzen Dec 20 '11 at 08:22
@dzen I have not yet used `urllib3` so I'll have to look into that. Thanks for the heads up. – chown Dec 20 '11 at 21:26
request is a wrapper on urllib3 which is bundle into this module. https://github.com/kennethreitz/requests/tree/develop/requests/packages/urllib3 – dzen Dec 22 '11 at 09:32
4

@chown the syntax changed with requests 2.0.0. You'll need to add a schema to the url: http://docs.python-requests.org/en/latest/user/advanced/#proxies It'd nice if you could add this to your answer here – Jay Mar 24 '14 at 07:57
@Jay: I added the URL schema. – Johannes Charra Oct 02 '14 at 13:29
1

This will not work for `socks5` proxy: `'http' : "socks5://myproxy:9191",` – loretoparisi Apr 05 '16 at 10:21
Those are bad examples of the linux environment variables HTTP_PROXY and HTTPS_PROXY. The protocol should always be included (not just host:port), and either proxy can itself be "https" or "http". HTTPS_PROXY=http://myhttpsproxy:8080 is valid, it just means proxy "https" requests using http://myhttpsproxy:8080 instead of the value of HTTP_PROXY. If you don't define HTTPS_PROXY linux apps typically use CONNECT over the HTTP_PROXY. – jamshid Jun 30 '18 at 23:32
What if you want multiple proxies per protocol. Currently you just have one for each. – MasayoMusic Jun 24 '19 at 22:23
@Jay link is dead. New link is https://2.python-requests.org/en/master/user/advanced/#proxies – mincom Apr 28 '20 at 05:25
Not only protocol, but host is possible as well: `proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}` – MrKsn Apr 13 '21 at 12:32
@chown I tried out the exact same code with my IP and port number but it still blocks the website that I use to scrape data from(craiglist.com). Any idea about this? – QUEEN Apr 20 '22 at 08:25
@chown I replaced the `'http_proxy ' `with my IP address and the port as `8888 `because that's where my localhost was running. Should the port values like '3128,1080' be the same for all devices? What about the IP addresses then? if the website gets hits from the same IPs every time, it will surely block! – QUEEN Apr 20 '22 at 08:46
this doesn't work in windows 10 from command line, I continue to get errors when python tries to connect to a youtube source – gianni Sep 01 '22 at 10:33

score 53 · Answer 2 · edited May 31 '22 at 15:08

You can refer to the proxy documentation here.

If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:

import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "https://10.10.1.10:1080",
}

requests.get("http://example.org", proxies=proxies)

To use HTTP Basic Auth with your proxy, use the http://user:password@host.com/ syntax:

proxies = {
    "http": "http://user:pass@10.10.1.10:3128/"
}

score 40 · Answer 3 · edited Mar 05 '19 at 14:52

40

I have found that urllib has some really good code to pick up the system's proxy settings and they happen to be in the correct form to use directly. You can use this like:

import urllib

...
r = requests.get('http://example.org', proxies=urllib.request.getproxies())

It works really well and urllib knows about getting Mac OS X and Windows settings as well.

edited Mar 05 '19 at 14:52

EddyG

675
4
13

answered May 01 '13 at 01:54

Ben Golding

726
6
8

1

Does it work without a proxy? Some of our users has no proxy and some has. – Jonas Lejon Nov 02 '14 at 06:55
1

@jonasl Yes, it does work even when there's no system proxy defined. In that case, it's just an empty `dict`. – Shravan Jan 28 '16 at 11:54
2

Does it include no_proxy and does requests respect no_proxy? Nevermind, it seems there are solutions: https://github.com/kennethreitz/requests/issues/879 – jrwren Nov 21 '16 at 18:09
6

getting err: `module 'urllib' has no attribute 'getproxies'` – Zahra May 02 '17 at 16:19
4

Greenish: urllib.request.getproxies() – oliche May 03 '17 at 16:51
1

@Zahra try urllib2.getproxies() – rleelr May 17 '19 at 09:43
1

@Zahra: use `import urllib.request` and afterwards `urllib.request.getproxies()`. Source: https://stackoverflow.com/questions/37042152/python-3-5-1-urllib-has-no-attribute-request – the_economist Jan 27 '21 at 13:04

Owen B · Answer 4 · 2015-02-14T22:01:31.163

23

The accepted answer was a good start for me, but I kept getting the following error:

AssertionError: Not supported proxy scheme None

Fix to this was to specify the http:// in the proxy url thus:

http_proxy  = "http://194.62.145.248:8080"
https_proxy  = "https://194.62.145.248:8080"
ftp_proxy   = "10.10.1.10:3128"

proxyDict = {
              "http"  : http_proxy,
              "https" : https_proxy,
              "ftp"   : ftp_proxy
            }

I'd be interested as to why the original works for some people but not me.

Edit: I see the main answer is now updated to reflect this :)

edited Feb 14 '15 at 22:01

answered Feb 03 '14 at 14:28

Owen B

1,185
13
15

5

changed with 2.0.0: Proxy URLs now must have an explicit scheme. A MissingSchema exception will be raised if they don't. – Jay Mar 24 '14 at 07:54

score 14 · Answer 5 · answered Nov 30 '18 at 18:16

14

If you'd like to persisist cookies and session data, you'd best do it like this:

import requests

proxies = {
    'http': 'http://user:pass@10.10.1.0:3128',
    'https': 'https://user:pass@10.10.1.0:3128',
}

# Create the session and set the proxies.
s = requests.Session()
s.proxies = proxies

# Make the HTTP request through the session.
r = s.get('http://www.showmemyip.com/')

answered Nov 30 '18 at 18:16

User

23,729
38
124
207

Do we have to send the "Proxy-Connection: Keep-alive" header manually in the python requests ? – Simplecode Oct 31 '21 at 13:31

score 12 · Answer 6 · answered Feb 22 '20 at 14:10

12

8 years late. But I like:

import os
import requests

os.environ['HTTP_PROXY'] = os.environ['http_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['HTTPS_PROXY'] = os.environ['https_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['NO_PROXY'] = os.environ['no_proxy'] = '127.0.0.1,localhost,.local'

r = requests.get('https://example.com')  # , verify=False

answered Feb 22 '20 at 14:10

qräbnö

2,722
27
40

2

I like this last resort solution that no one else mentioned here. It just saved my day as there was no other way of passing proxy settings to a 3rd party library I'm using. – t3chb0t May 13 '22 at 06:40

score 9 · Answer 7 · edited May 31 '22 at 15:09

The documentation gives a very clear example of the proxies usage

import requests

proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',
}

requests.get('http://example.org', proxies=proxies)

What isn't documented, however, is the fact that you can even configure proxies for individual urls even if the schema is the same! This comes in handy when you want to use different proxies for different websites you wish to scrape.

proxies = {
  'http://example.org': 'http://10.10.1.10:3128',
  'http://something.test': 'http://10.10.1.10:1080',
}

requests.get('http://something.test/some/url', proxies=proxies)

Additionally, requests.get essentially uses the requests.Session under the hood, so if you need more control, use it directly

import requests

proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.update(proxies)

session.get('http://example.org')

I use it to set a fallback (a default proxy) that handles all traffic that doesn't match the schemas/urls specified in the dictionary

import requests

proxies = {
  'http': 'http://10.10.1.10:3128',
  'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.setdefault('http', 'http://127.0.0.1:9009')
session.proxies.update(proxies)

session.get('http://example.org')

score 2 · Answer 8 · answered Dec 25 '18 at 12:24

i just made a proxy graber and also can connect with same grabed proxy without any input here is :

#Import Modules

from termcolor import colored
from selenium import webdriver
import requests
import os
import sys
import time

#Proxy Grab

options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.get("https://www.sslproxies.org/")
tbody = driver.find_element_by_tag_name("tbody")
cell = tbody.find_elements_by_tag_name("tr")
for column in cell:

        column = column.text.split(" ")
        print(colored(column[0]+":"+column[1],'yellow'))
driver.quit()
print("")

os.system('clear')
os.system('cls')

#Proxy Connection

print(colored('Getting Proxies from graber...','green'))
time.sleep(2)
os.system('clear')
os.system('cls')
proxy = {"http": "http://"+ column[0]+":"+column[1]}
url = 'https://mobile.facebook.com/login'
r = requests.get(url,  proxies=proxy)
print("")
print(colored('Connecting using proxy' ,'green'))
print("")
sts = r.status_code

score 1 · Answer 9 · answered Nov 13 '12 at 14:30

here is my basic class in python for the requests module with some proxy configs and stopwatch !

import requests
import time
class BaseCheck():
    def __init__(self, url):
        self.http_proxy  = "http://user:pw@proxy:8080"
        self.https_proxy = "http://user:pw@proxy:8080"
        self.ftp_proxy   = "http://user:pw@proxy:8080"
        self.proxyDict = {
                      "http"  : self.http_proxy,
                      "https" : self.https_proxy,
                      "ftp"   : self.ftp_proxy
                    }
        self.url = url
        def makearr(tsteps):
            global stemps
            global steps
            stemps = {}
            for step in tsteps:
                stemps[step] = { 'start': 0, 'end': 0 }
            steps = tsteps
        makearr(['init','check'])
        def starttime(typ = ""):
            for stemp in stemps:
                if typ == "":
                    stemps[stemp]['start'] = time.time()
                else:
                    stemps[stemp][typ] = time.time()
        starttime()
    def __str__(self):
        return str(self.url)
    def getrequests(self):
        g=requests.get(self.url,proxies=self.proxyDict)
        print g.status_code
        print g.content
        print self.url
        stemps['init']['end'] = time.time()
        #print stemps['init']['end'] - stemps['init']['start']
        x= stemps['init']['end'] - stemps['init']['start']
        print x


test=BaseCheck(url='http://google.com')
test.getrequests()

score 0 · Answer 10 · answered Aug 08 '18 at 15:02

0

It’s a bit late but here is a wrapper class that simplifies scraping proxies and then making an http POST or GET:

ProxyRequests

https://github.com/rootVIII/proxy_requests

answered Aug 08 '18 at 15:02

score 0 · Answer 11 · answered Nov 24 '21 at 19:09

Already tested, the following code works. Need to use HTTPProxyAuth.

import requests
from requests.auth import HTTPProxyAuth


USE_PROXY = True
proxy_user = "aaa"
proxy_password = "bbb"
http_proxy = "http://your_proxy_server:8080"
https_proxy = "http://your_proxy_server:8080"
proxies = {
    "http": http_proxy,
    "https": https_proxy
}

def test(name):
    print(f'Hi, {name}')  # Press Ctrl+F8 to toggle the breakpoint.
    # Create the session and set the proxies.
    session = requests.Session()
    if USE_PROXY:
        session.trust_env = False
        session.proxies = proxies
        session.auth = HTTPProxyAuth(proxy_user, proxy_password)

    r = session.get('https://www.stackoverflow.com')
    print(r.status_code)

if __name__ == '__main__':
    test('aaa')

Lambov · Answer 12 · 2020-07-12T18:50:28.010

I share some code how to fetch proxies from the site "https://free-proxy-list.net" and store data to a file compatible with tools like "Elite Proxy Switcher"(format IP:PORT):

##PROXY_UPDATER - get free proxies from https://free-proxy-list.net/

from lxml.html import fromstring
import requests
from itertools import cycle
import traceback
import re

######################FIND PROXIES#########################################
def get_proxies():
    url = 'https://free-proxy-list.net/'
    response = requests.get(url)
    parser = fromstring(response.text)
    proxies = set()
    for i in parser.xpath('//tbody/tr')[:299]:   #299 proxies max
        proxy = ":".join([i.xpath('.//td[1]/text()') 
        [0],i.xpath('.//td[2]/text()')[0]])
        proxies.add(proxy)
    return proxies



######################write to file in format   IP:PORT######################
try:
    proxies = get_proxies()
    f=open('proxy_list.txt','w')
    for proxy in proxies:
        f.write(proxy+'\n')
    f.close()
    print ("DONE")
except:
    print ("MAJOR ERROR")

Do they allow unlimited scraping? – shekhar chander Feb 14 '21 at 11:15 — shekhar chander, Feb 14 '21 at 11:15
This has nothing to do with OP's question – bfontaine Oct 04 '21 at 09:24 — bfontaine, Oct 04 '21 at 09:24

Proxies with Python 'Requests' module

12 Answers12

Linked

Related