30

I have looked everywhere and found millions of python proxy servers but none do precisely what i would like (i think :s)

I have had quite a bit of experience with python generally, but i'm quite new to the world of the deep dark secrets of the HTTP protocol.

What i think might be useful would be a very simple proxy example that can be connected to and will then itself try to connect to the address passed to it.

Also, i think what has been confusing me is everything the hidden stuff is doing, e.g. if the class inherits from BaseHTTPServer.BaseHTTPRequestHandler what precisely happens when a page is requested, as in many of the examples i have found there is no reference to path variable then suddenly poof! self.path is used in a function. im assuming it's been inherited, but how does it end up with the path used?

im sorry if that didn't make much sense, as my idea of my problem is probably scrambled :(

if you can think of anything which would make my question clearer please, please suggest i add it. xxx

Edit:

Also, a link to an explaination of the detailed processes through which the proxy handles the request, requests the page (how to read/modify the data at this point) and passes it to the original requester would be greatly appreciated xxxx

jma
  • 798
  • 1
  • 10
  • 17
  • If there is any modification of either the request or the response that you want to perform, you should elaborate what that processing is. E.g. given the URL passed to you, how do you determine the URL that you want to connect to? (don't say "the same", since that would go back to you). – Martin v. Löwis Dec 10 '10 at 20:09
  • umm, to elaborate, i meant URL filtering, access to the data for something like weighted word count, or maybe to edit arbitrarily (still general, but simply access to it as a string is all really needed) – jma Jan 12 '11 at 18:22
  • Thought you might find this useful: https://null-byte.wonderhowto.com/how-to/sploit-make-proxy-server-python-0161232/ – user9123 Oct 21 '18 at 20:23
  • 2
    I am author of proxy.py, lightweight http, https and websockets proxy server distributed as a single Python file with no external dependency. https://github.com/abhinavsingh/proxy.py You might want to inspect its source code for inner details, Thank You. – Abhinav Singh Dec 10 '18 at 10:56

3 Answers3

40

"a very simple proxy example that can be connected to and will then itself try to connect to the address passed to it." That is practically the definition of an HTTP proxy.

There's a really simple proxy example here: http://effbot.org/librarybook/simplehttpserver.htm

The core of it is just 3 lines:

class Proxy(SimpleHTTPServer.SimpleHTTPRequestHandler):
    def do_GET(self):
        self.copyfile(urllib.urlopen(self.path), self.wfile)

So it's a SimpleHTTPRequestHandler that, in response to a GET request, opens the URL in the path (a request to a proxy typically looks like "GET http://example.com/", not like "GET /index.html"). It then just copies whatever it can read from that URL to the response.

Notet that this is really minimal. It doesn't deal with headers at all, I believe.

BTW: path is documented at http://docs.python.org/library/basehttpserver.html. It was set before your do* method was called.

Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
  • That was amazingly quick thank you!!!! yea sorry about the longwindedness ..... xx – jma Dec 10 '10 at 19:43
  • beyond what that page says, i was thinking do you have any clues as to how to edit the data? i was thinking rfile/wfile, but i have no idea where this should be put in a script, if it is even the right thing xxx – jma Dec 10 '10 at 19:50
  • 3
    For minimal header handling (you don't even get 200 OK with this), add `self.send_response(200)` and `self.end_headers()` as the first two lines of `do_GET()`. (Without them, `ab` considers the requests to have failed.) – mjs Feb 13 '11 at 14:11
  • To top of mjs's comment, I think you better send a response header extract the response code form the handle returned from urlopen. Further, you better send back a 'Content-Length' header with the size of data read from the handle to make sure the HTTP tcp stream is parsed correctly at the client's end. – mshamma Feb 06 '13 at 18:55
  • How do I use it via `requests` library now? I did `requests.get(url, proxies={dict with http and https keys})` and in the proxy server console I see `code 501, message Unsupported method ('CONNECT')` – scythargon Apr 18 '18 at 13:49
  • The example only handles GET requests. Is there a way for the proxy server to handle the POST requests traffic? – oikos99 Mar 02 '19 at 23:48
  • How do I do this in Python3? `self.copyfile` doesn't exist in BaseHTTPRequestHandler – captain Jun 09 '22 at 01:44
  • 1
    @captain I haven't tested it yet, but try `shutil.copyfileobj(urllib.urlopen(self.path), self.wfile)`. – Laurence Gonsalves Jun 09 '22 at 19:33
  • 1
    Thanks @LaurenceGonsalves! That works. Also changed `urllib.urlopen()` to `urllib.request.urlopen` § `shutil.copyfileobj(urllib.request.urlopen(self.path), self.wfile)` – captain Jun 09 '22 at 19:51
17

From the twisted Wiki

from twisted.web import proxy, http
from twisted.internet import reactor
from twisted.python import log
import sys
log.startLogging(sys.stdout)

class ProxyFactory(http.HTTPFactory):
    protocol = proxy.Proxy

reactor.listenTCP(8080, ProxyFactory())
reactor.run()
sal
  • 23,373
  • 15
  • 66
  • 85
5

proxpy looks rather promising, it's very simple to tweak requests and responses.

Dima Tisnek
  • 11,241
  • 4
  • 68
  • 120
  • 2
    +1. If you want a proxy which forwards the exact request (including headers and all), but want to be able to tweak the request, then you want something like ProxPy. – Simon Radford Mar 18 '14 at 02:44