2
import socket
import os.path

IP = "127.0.0.1"
PORT = 80
DEFAULT_URL = "C:\webroot\index.html"
SOCKET_TIMEOUT = 0.2


def get_file_data(filename):
    """ Get data from file """
    source_file = open(filename, 'rb')
    data = source_file.read()
    source_file.close()
    return data


def handle_client_request(resource, client_socket):
    """ Check the required resource, generate proper HTTP response and send         to client"""
    if resource == '/':
        url = DEFAULT_URL
    else:
        url = resource

    if os.path.isfile(url):
        http_header = "HTTP/1.0 200 OK\r\n"
    else:
        client_socket.send("404 (Not Found)\r\n" + "connection close")
        client_socket.close()

    file_type = url.split(".")[-1]

    if file_type == 'html' or file_type == 'txt':
        http_header += "Content-Type: text/html; charset=utf-8\r\n"
    elif file_type == 'jpg':
        http_header += "Content-Type: image/jpeg\r\n"
    elif file_type == 'js':
        http_header += "Content-Type: text/javascript; charset=UTF-8\r\n"
    elif file_type == 'css':
        http_header += "Content-Type: text/css\r\n"

    data = get_file_data(url)
    http_header += "Content-Length:" + str(len(data)) + "\r\n"
    http_response = http_header + "\r\n" + data
    client_socket.send(http_response)

def validate_http_request(request):
    """ Check if request is a valid HTTP request and returns TRUE / FALSE   and the requested URL """
    request_li = request.split("\r\n")[0].split(" ")
    if request_li[0] != "GET" or request_li[2] != "HTTP/1.1" '/':
        return False, ''
    return True, request_li[1]


def handle_client(client_socket):
    """ Handles client requests: verifies client's requests are legal HTTP, calls function to handle the requests """
   print 'Client connected'
   try:
       while True:
            client_request = client_socket.recv(1024)
            print client_request.split("\r\n")[0]
            valid_http, resource = validate_http_request(client_request)
            if valid_http:
                print 'Got a valid HTTP request'
                handle_client_request(resource, client_socket)
            else:
                print "Error: HTTP request isn't valid"
                break
        print "closing connection"
        client_socket.close()
    except socket.timeout:
        print "closing connections"
        client_socket.close()


def main():
    server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    server_socket.bind((IP, PORT))
    server_socket.listen(10)
    print "Listening for connections on port %d" % PORT

    while True:
        client_socket, client_address = server_socket.accept()
        client_socket.settimeout(SOCKET_TIMEOUT)
        print 'New connection received'
        handle_client(client_socket)




if __name__ == "__main__":
    main()

I'm building a HTTP server for an assignment, the server is supposed to run local files from my computer on the browser. Right now I'm trying to run the default url.

First I get "/" as a request which is good, but then I receive an empty request which is an invalid request that closes the connection. After the server creates a new connection it gets "/css/doremon.css" as a request. doreomn.css is a file of the website I'm trying to run. This will create an error at get_file_data because the path is supposed to be: "C:\webroot\css\doremon.css".

This raises two questions: 1. Why does the client sends empty requests to the server? How can I prevent them from interrupting the connection? 2. From the third request it seems the client first sends the requested url and then requests files related to it, Is there a way to receive all of them at once? If not how can I fix the path for the requested files?

Aviad
  • 21
  • 1
  • 5
  • It seems like your questions are relative to actions of the client but you are only demonstrating server code - if the client is sending requests you do not know how to handle why are you not adjusting your server code to handle them? – Jmills Dec 29 '16 at 19:46
  • if your client is a web browser (not `curl`) you might be get the `OPTIONS` http command. how are you handling that? – Ereli Dec 29 '16 at 21:02
  • also worth reading about [keep-alive](https://en.wikipedia.org/wiki/HTTP_persistent_connection) and how HTTP connections are reused. here also http://stackoverflow.com/a/20799796/1265980 – Ereli Dec 29 '16 at 21:05

1 Answers1

2

Yes. Typically web servers have the notion of a "web root" path relative to which all requests are fulfilled. But that is a notion that is created and enforced by the web server. So, let's call that path WebRoot (it sounds like that is what you want C:\webroot\ to be).

Usually the default URL is then WebRoot/index.html". And if there is something else referred to by that page (such as "/css/doremon.css"), the client will request that resource (GET /css/doremon.css ...), and the server responds with the contents of WebRoot/css/doremon.css. But that happens because the server appends the requested resource to its own notion of WebRoot. How would the client know to do that? As the author of the web server, it's your responsibility to do that before you call os.path.is_file.

This is important for security anyway. You don't want the client to be able to root around in arbitrary parts of your server's filesystem. The client shouldn't be able to get to anything outside of the WebRoot.

To implement this, you should do something like this:

WEBROOT = 'c:\\webroot\\'
DEFAULT_URL = os.path.join(WEBROOT, "/index.html")

And when you handle a request:

if resource == '/':
    url = DEFAULT_URL
else:
    url = os.path.join(WEBROOT, resource)

For your question 1, it's likely that the client is not actually sending empty requests. However, you aren't handling "end of file" correctly. Your basic loop should be something like:

while True:
    client_request = client_socket.recv(1024)
    if client_request == '':
        # Client closed connection
        break
    valid_http, resource = validate_http_request(client_request)
    [...]

For your question 2, as @Ereli said, you should look into HTTP keep-alive. That's a mechanism to keep the connection open, but it will be only be used if both sides agree to do so. If you wish to support that, you need to advertise it in the headers you return as described in the link. If you do not advertise them, then HTTP assumes a single request per connection. Hence, a closed connection is the correct behavior here until you provide the Connection: keep-alive header (and is still always a valid behavior anyway). And you are interpreting that closed connection (signaled by an empty buffer returned from recv) as an "empty request".

Finally, if/when you do handle the keep-alive, you will need to do a more thorough job of parsing headers. An HTTP request is a single line (usually containing GET but there are other verbs as well), followed by an indeterminate number of additional header lines, followed by an empty line, and then possibly additional data (the additional data are not present for a GET but may be for PUT and POST).

Now all of those lines may not be delivered with a single recv call. Hence, you need to keep receiving data until you find the blank line signaling the end of the request (or you could -- and probably should -- impose some arbitrary limit so as to protect yourself from an attacker attempting to use up all your memory by feeding you headers forever).

Also, you should be aware of the possibility that you could receive more than a single request from a single recv. For example, once you have advertised keep-alive, the client could send two requests back to back (called pipelining). Those might show up in your buffer via a single receive. So the important thing to remember is that, for HTTP, you need to be parsing lines not just processing a complete buffer full at a time.

Gil Hamilton
  • 11,973
  • 28
  • 51
  • 1
    Great answer. It might be interesting to add a check for requests like `/../mySecretFile` (those dots could also be encoded). – Danny_ds Dec 30 '16 at 08:39