0

I wrote a server proxy in python, and try to create own cache file, but actually i can't resolve problem consist with encoding (I'm not sure). I suppose that a issue may cause a header Content-encoding: gzip.

This is a part for incoming request, a server create new Thread, and execute following function:

def proxy_thread(self, client_socket, client_addres):
    # get the request from browser
    request_queue = []

    request = client_socket.recv(17000)
    method = request.decode("utf-8").split("\n")[0].split(" ")[0]

    if method == "GET":

        if is_in_cache(cache_path, request): # check wheter file is in C:\TMP directory, if exist use this file

            print("From cache")
            id = parse_url(request)
            cached_request = open_cached_www(cache_path,id)
            remote_server, remote_port = parse_request(request)
            remote_socket = socket.socket(socket.AF_INET,
                                          socket.SOCK_STREAM)  # by default: socket.AF_INET, socket.SOCK_STREAM
            remote_socket.connect((remote_server,
                                   remote_port))

            res = remote_socket.send(cached_request.encode("UTF-8"))
            print(res)

            client_socket.settimeout(5)

        else:

            print("Not from cache")
            remote_server, remote_port = parse_request(request)
            request_queue.append(request)

            remote_socket = socket.socket(socket.AF_INET,
                                          socket.SOCK_STREAM)
            remote_socket.connect((remote_server,
                                   remote_port))

            res = remote_socket.sendall(request)
            client_socket.settimeout(5)


            if res == None:
                try:

                    data = remote_socket.recv(4096)
                    id = parse_url(request) # Function parse request extract url from request, and create HASH
                    client_socket.sendall(data)
                    write_to_cache(cache_path,id)
                    # ERROR: 'utf-8' codec can't decode byte 0x8b in position 367: invalid start byte

                    client_socket.close()
                    print("[*]data send successful!")

                except UnicodeDecodeError as U_err:
                    print(U_err)

Function below, support server in read/write cache

# Function write response from server, here are 
def write_to_cache(path, hash, http_response):
    full_path = path.strip("\"") + "\\" + hash  # full path to cached file
    with open(full_path, 'w') as write_cache:
        write_cache.write(http_response)
        write_cache.close()
    print("Write co cache successful")

def open_cached_www(path, hash):
    full_path = path.strip("\"") + "\\" + hash  # full path to cached file
    with open(full_path, "r") as read_cached_www:
        print("Load: ", full_path)
        cached_request = read_cached_www.read()
        read_cached_www.close()

Below response from http://www.example.com,

b'HTTP/1.1 200 OK\r\n
Content-Encoding: gzip\r\n #<-- i suppose that might be a problem
Accept-Ranges: bytes\r\n
Cache-Control: max-age=604800\r\n
Content-Type: text/html; charset=UTF-8\r\n
Date: Tue, 14 Jan 2020 16:34:02 GMT\r\n
Etag: "3147526947"\r\n
Expires: Tue, 21 Jan 2020 16:34:02 GMT\r\n
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT\r\n
Server: ECS (nyb/1D11)\r\n
Vary: Accept-Encoding\r\n
X-Cache: HIT\r\n
Content-Length: 648\r\n
\r\n
\x1f\x8b\x08\x00\xc2\x15\xa8]\x00\x03}TMs\xdb \x10\xbd\xfbWl\xd5K2#$\'i\x1a\x8f-i\xfa\x99i\x0fi\x0fi\x0f=\x12\xb1\xb2\x98\x08P\x01\xc9\xf6t\xf2\xdf\xbbB\x8e#7\x99\x9a\x91\x81]x\xbb\xef\xb1\x90\xbd\x12\xa6\xf4\xbb\x16\xa1\xf6\xaa)f\xd9c\x87\\\x143\xa0_\xe6\xa5o\xb0\xf8\xbc\xe5\xaam\x10>\x19\xc5\xa5\xce\xd2\xd1:\x1b\x97(\xf4\x1c\xca\x9a[\x87>\x8f:_\xb1E\x04i1q\xd6\xde\xb7\x0c\x7fw\xb2\xcf\xa3\x8fF{\xd4\x9e\ra#(\xc7Y\x1ey\xdc\xfat\x08\xbf:@\xbd\x84\xa4\xb9\xc2<\xea%nZc\xfdd\xffF\n_\xe7\x02{Y"\x0b\x93\x18\xa4\x96^\xf2\x86\xb9\x927\x98\x9f=A9\xbf#2C\x06\xfb\xc0\xa5s\xd1\xe8\xbb3b\x07\x7f\xc20Lyy\xbf\xb6\xa6\xd3\x82\x95\xa61v\t\xaf\xab9\xb5\xf3\xd5a\x89\xe2v-\xf5\x12\xe6O\xa6\x96\x0b!\xf5\xfa\xc8VQ\xa6\xac\xe2J6\xbb%0\xde\x92\x9c\xcc\xed\x9cG\x15\xc3\xd8\xb3N\xc6\xf0\xa1\x91\xfa\xfe\x86\x97\xb7\xc1tM\x9bb\x88nqm\x10~~\x8dh\xfc\xbdE\r\xb7\\\xbba\xf2\x05\x9b\x1e\xbd,9|\xc3\x0e\xc9r0\xc4\xf0\xde\x12w\xc2\xa6\xa5\xcc\xa1\x95\xd5S.a\xf0\x10\xfe\x85\xec\'t\x83pKx;\x9f\xb7\xdb\xe7\x0c/Q\x01\xef\xbcy\x81\xe89\xaa\xd5\x7fE\x13\xd4&\x19\xdc\x19+\xd02\xcb\x85\xec\x1c\xe9\x94\\\x1e\x01\x98-s5\x17fC\xc8\xed\x16.\xe8\xbb\xa2o\x18\xdb\xf5\x1d?\x99\xc7\xa1%\xf3\xf3\xd3\xd5\x84\x0c_\x0e\xea\xc5\xd4\xf7\xd2I\x8fbB\xed1\x93\x8b\xc5\x9b\xc5b\x92\xc9p\xfeL`i,\xf7\xd2\x10Km4NA\xdf)\x14\x92\xc3\x89\xe2[\xb6\xd7\xe7j\xd0\xe7t\x02~\xac\xe2QU\xfc\xa3\xd8D\xe5c\xc7\xc3$d\x96\x86\n-\xc2Ye\xe9x\x1dg\xd9P\x9bt;)\xd8\xbe\x8e\xeb\xb3g7\x93L\xa3\xaf-~\xd4\xd2\x81\x08v\xa0Qe,t\x0ea\x985M\xe7\xfc@\xb8G\xc0\x11\xc1\r\x0ez\x0e:E\xf7\xc9%\xf0\xcbtDb\x17\xb6xB\x1a\xabe\x8f\xa6\xa1!y\t\xa0\xb3Ht|m:\x0f\xad\x95\x14\xa24t\xb4R\x071\x81\xe6\xdc\xddS\x85\x84\xe8-Z%\x9d#G\x92\xa5\xed!\xcf\x8c\x1e\x08\x8bU\x1e\r\xcf\x84[\xa6\xe9f\xb3I$\xd7<1v\x9d\x8e!]\xbaO3*n\x8c\x1dH\x10\xa0\nA\x92\x84\xd0x\x11\x10\xb34\x88\x93\xa5{\xa9\xd2\xf1A\xfb\x0b(\xeb|o\xe8\x04\x00\x00'

I need to serve that response to web browser, but honesty i don't know where is a problem. I think that if i decode that binary data i should have pure HTML content, and this content should be send to web browser as response. I'm a little stuck :) help

reg3x_mr
  • 75
  • 1
  • 10

1 Answers1

1

Please note this is in no way a correct or thorough treatment of a HTTP response. I'm merely responding to your gzip problem. To actually implement a HTTP cache, you probably need to read more... or be more specific in your question.

If you want to decompress the gzipped data, you can simply use:

import gzip

gzip.decompress(data)

For your response, you can try something similar to:

import gzip

response = b'''HTTP/1.1 200 OK\r\nContent-Encoding: gzip\r\nAccept-Ranges: bytes\r\nCache-Control: max-age=604800\r\nContent-Type: text/html; charset=UTF-8\r\nDate: Tue, 14 Jan 2020 16:34:02 GMT\r\nEtag: "3147526947"\r\nExpires: Tue, 21 Jan 2020 16:34:02 GMT\r\nLast-Modified: Thu, 17 Oct 2019 07:18:26 GMT\r\nServer: ECS (nyb/1D11)\r\nVary: Accept-Encoding\r\nX-Cache: HIT\r\nContent-Length: 648\r\n\r\n\x1f\x8b\x08\x00\xc2\x15\xa8]\x00\x03}TMs\xdb \x10\xbd\xfbWl\xd5K2#$\'i\x1a\x8f-i\xfa\x99i\x0fi\x0fi\x0f=\x12\xb1\xb2\x98\x08P\x01\xc9\xf6t\xf2\xdf\xbbB\x8e#7\x99\x9a\x91\x81]x\xbb\xef\xb1\x90\xbd\x12\xa6\xf4\xbb\x16\xa1\xf6\xaa)f\xd9c\x87\\\x143\xa0_\xe6\xa5o\xb0\xf8\xbc\xe5\xaam\x10>\x19\xc5\xa5\xce\xd2\xd1:\x1b\x97(\xf4\x1c\xca\x9a[\x87>\x8f:_\xb1E\x04i1q\xd6\xde\xb7\x0c\x7fw\xb2\xcf\xa3\x8fF{\xd4\x9e\ra#(\xc7Y\x1ey\xdc\xfat\x08\xbf:@\xbd\x84\xa4\xb9\xc2<\xea%nZc\xfdd\xffF\n_\xe7\x02{Y"\x0b\x93\x18\xa4\x96^\xf2\x86\xb9\x927\x98\x9f=A9\xbf#2C\x06\xfb\xc0\xa5s\xd1\xe8\xbb3b\x07\x7f\xc20Lyy\xbf\xb6\xa6\xd3\x82\x95\xa61v\t\xaf\xab9\xb5\xf3\xd5a\x89\xe2v-\xf5\x12\xe6O\xa6\x96\x0b!\xf5\xfa\xc8VQ\xa6\xac\xe2J6\xbb%0\xde\x92\x9c\xcc\xed\x9cG\x15\xc3\xd8\xb3N\xc6\xf0\xa1\x91\xfa\xfe\x86\x97\xb7\xc1tM\x9bb\x88nqm\x10~~\x8dh\xfc\xbdE\r\xb7\\\xbba\xf2\x05\x9b\x1e\xbd,9|\xc3\x0e\xc9r0\xc4\xf0\xde\x12w\xc2\xa6\xa5\xcc\xa1\x95\xd5S.a\xf0\x10\xfe\x85\xec\'t\x83pKx;\x9f\xb7\xdb\xe7\x0c/Q\x01\xef\xbcy\x81\xe89\xaa\xd5\x7fE\x13\xd4&\x19\xdc\x19+\xd02\xcb\x85\xec\x1c\xe9\x94\\\x1e\x01\x98-s5\x17fC\xc8\xed\x16.\xe8\xbb\xa2o\x18\xdb\xf5\x1d?\x99\xc7\xa1%\xf3\xf3\xd3\xd5\x84\x0c_\x0e\xea\xc5\xd4\xf7\xd2I\x8fbB\xed1\x93\x8b\xc5\x9b\xc5b\x92\xc9p\xfeL`i,\xf7\xd2\x10Km4NA\xdf)\x14\x92\xc3\x89\xe2[\xb6\xd7\xe7j\xd0\xe7t\x02~\xac\xe2QU\xfc\xa3\xd8D\xe5c\xc7\xc3$d\x96\x86\n-\xc2Ye\xe9x\x1dg\xd9P\x9bt;)\xd8\xbe\x8e\xeb\xb3g7\x93L\xa3\xaf-~\xd4\xd2\x81\x08v\xa0Qe,t\x0ea\x985M\xe7\xfc@\xb8G\xc0\x11\xc1\r\x0ez\x0e:E\xf7\xc9%\xf0\xcbtDb\x17\xb6xB\x1a\xabe\x8f\xa6\xa1!y\t\xa0\xb3Ht|m:\x0f\xad\x95\x14\xa24t\xb4R\x071\x81\xe6\xdc\xddS\x85\x84\xe8-Z%\x9d#G\x92\xa5\xed!\xcf\x8c\x1e\x08\x8bU\x1e\r\xcf\x84[\xa6\xe9f\xb3I$\xd7<1v\x9d\x8e!]\xbaO3*n\x8c\x1dH\x10\xa0\nA\x92\x84\xd0x\x11\x10\xb34\x88\x93\xa5{\xa9\xd2\xf1A\xfb\x0b(\xeb|o\xe8\x04\x00\x00'''

preamble, _, body = response.partition(b"\r\n\r\n")
preamble_lines = preamble.splitlines()
status = preamble_lines[0]
headers = dict()
for line in preamble_lines[1:]:
    header, value = line.decode("utf-8").split(":", 1)
    headers[header.lower()] = value.strip()

if "content-encoding" in headers and headers["content-encoding"] == "gzip":
    body = gzip.decompress(body)

print(f"Status: {status}")
print(f"Headers: {headers}")
print(f"Body:\n{body}")

which produces as output:

Status: b'HTTP/1.1 200 OK'
Headers: {'content-encoding': 'gzip', 'accept-ranges': 'bytes', 'cache-control': 'max-age=604800', 'content-type': 'text/html; charset=UTF-8', 'date': 'Tue, 14 Jan 2020 16:34:02 GMT', 'etag': '"3147526947"', 'expires': 'Tue, 21 Jan 2020 16:34:02 GMT', 'last-modified': 'Thu, 17 Oct 2019 07:18:26 GMT', 'server': 'ECS (nyb/1D11)', 'vary': 'Accept-Encoding', 'x-cache': 'HIT', 'content-length': '648'}
Body:
b'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>\n'
petre
  • 1,485
  • 14
  • 24
  • Your solution is very help full, and this little part of code gave me a lot new knowledge . Thank you very much :))) But now i know that , this is not problem with encoding :)) i need to keep looking . – reg3x_mr Jan 15 '20 at 12:39
  • But is one thing, what i need to do .. i should try write this row data to file, and then load them and send to web browser, but when i try do that i have error like this one: *** ERROR: 'utf-8' codec can't decode byte 0x8b in position 367: invalid start byte *** do you know some work around ? – reg3x_mr Jan 15 '20 at 12:53
  • What's the actual string you get? I used utf-8 out of habit, but you'll have to check how http is actually passing data around. For interpreting the encoding in the body (the content) sent to the browser, you'd use Content-type (https://www.w3.org/International/articles/http-charset/index). But for header values, it appears to be ascii. See also: https://stackoverflow.com/questions/4400678/what-character-encoding-should-i-use-for-a-http-header – petre Jan 15 '20 at 14:23