22

I would like to be able to construct a raw HTTP request and send it with a socket. Obviously, you would like me to use something like urllib and urllib2 but I do not want to use that.

It would have to look something like this:

import socket

tcpsoc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tcpsoc.bind(('72.14.192.58', 80)) #bind to googles ip
tcpsoc.send('HTTP REQUEST')
response = tcpsoc.recv()

Obviously you would also have to request the page/file and get and post parameters

Jacob Valenta
  • 6,659
  • 8
  • 31
  • 42
  • 3
    Well in principle it's totally easy, you send 'GET someurl HTTP/1.1' followed by 'Host: theserversname' followed by two newlines. What makes it complicated is that there are a million options and a million possible replies that you have to parse (that's why one would useually say "use a library"). – Damon Apr 22 '11 at 12:47
  • 3
    you need tcpsoc.connect instead of bind. bind is for listening sockets... – Milan Apr 22 '11 at 12:51
  • 1
    here's an example of connect: http://docs.python.org/library/socket.html#example – Milan Apr 22 '11 at 12:52
  • 8
    @jathanism sometimes we like to reinvent the wheel to get an idea of how to make it better. – Whyrusleeping Dec 05 '12 at 07:49
  • 2
    Or to learn how the wheel works – Burrito Jul 31 '17 at 19:07

5 Answers5

34
import socket
import urlparse


CONNECTION_TIMEOUT = 5
CHUNK_SIZE = 1024
HTTP_VERSION = 1.0
CRLF = "\r\n\r\n"

socket.setdefaulttimeout(CONNECTION_TIMEOUT)


def receive_all(sock, chunk_size=CHUNK_SIZE):
    '''
    Gather all the data from a request.
    '''
    chunks = []
    while True:
        chunk = sock.recv(int(chunk_size))
        if chunk:
            chunks.append(chunk)
        else:
            break

    return ''.join(chunks)



def get(url, **kw):
    kw.setdefault('timeout', CONNECTION_TIMEOUT)
    kw.setdefault('chunk_size', CHUNK_SIZE)
    kw.setdefault('http_version', HTTP_VERSION)
    kw.setdefault('headers_only', False)
    kw.setdefault('response_code_only', False)
    kw.setdefault('body_only', False)
    url = urlparse.urlparse(url)
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.settimeout(kw.get('timeout'))
    sock.connect((url.netloc, url.port or 80))
    msg = 'GET {0} HTTP/{1} {2}'
    sock.sendall(msg.format(url.path or '/', kw.get('http_version'), CRLF))
    data = receive_all(sock, chunk_size=kw.get('chunk_size'))
    sock.shutdown(socket.SHUT_RDWR)
    sock.close()

    data = data.decode(errors='ignore')
    headers = data.split(CRLF, 1)[0]
    request_line = headers.split('\n')[0]
    response_code = request_line.split()[1]
    headers = headers.replace(request_line, '')
    body = data.replace(headers, '').replace(request_line, '')


    if kw['body_only']:
        return body
    if kw['headers_only']:
        return headers
    if kw['response_code_only']:
        return response_code
    else:
        return data


print(get('http://www.google.com/'))
Ricky Wilson
  • 3,187
  • 4
  • 24
  • 29
17

Most of what you need to know is in the HTTP/1.1 spec, which you should definitely study if you want to roll your own HTTP implementation: http://www.w3.org/Protocols/rfc2616/rfc2616.html

Kristopher Johnson
  • 81,409
  • 55
  • 245
  • 302
8

Yes, basically you just have to write text, something like :

GET /pageyouwant.html HTTP/1.1[CRLF]
Host: google.com[CRLF]
Connection: close[CRLF]
User-Agent: MyAwesomeUserAgent/1.0.0[CRLF]
Accept-Encoding: gzip[CRLF]
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7[CRLF]
Cache-Control: no-cache[CRLF]
[CRLF]

Feel free to remove / add headers at will.

user703016
  • 37,307
  • 8
  • 87
  • 112
  • Hi! What is name of above text?`Raw request`,`Raw message` or another? – hasanghaforian Feb 09 '15 at 16:23
  • @hasanghaforian - If is still relevant, the full text is called *HTTP header*. Optionally, it is followed by the real content, separated by one empty line. This is the case, when you send a response back to the client of upload data to the server. – linusg Dec 06 '16 at 14:54
  • @linusg Thank you for your reply! – hasanghaforian Dec 06 '16 at 15:10
  • CRLF is to be substituted by such line break or should it literally be included? –  Sep 11 '18 at 14:30
3
"""
This module is a demonstration of how to send
a HTTP request from scratch with the socket module.
"""
import socket

__author__ = "Ricky L Wilson."
__email__ = "echoquote@gmail.com"
"""
The term CRLF refers to Carriage Return (ASCII 13, \r)
Line Feed (ASCII 10, \n).
They're used to note the termination of a line,
however, dealt with
differently in today's popular Operating Systems.
"""
CRLF = '\r\n'
SP = ' '
CR = '\r'
HOST = 'www.example.com'
PORT = 80
PATH = '/'


def request_header(host=HOST, path=PATH):
    """
    Create a request header.
    """
    return CRLF.join([
        "GET {} HTTP/1.1".format(path), "Host: {}".format(host),
        "Connection: Close\r\n\r\n"
    ])


def parse_header(header):
    # The response-header fields allow the server 
    # to pass additional information about the 
    # response which cannot be placed in the 
    # Status- Line. 

    # These header fields give information about 
    # the server and about further access to the 
    # resource identified by the Request-URI.
    header_fields = header.split(CR)
    # The first line of a Response message is the 
    # Status-Line, consisting of the protocol version 
    # followed by a numeric status code and its 
    # associated textual phrase, with each element 
    # separated by SP characters.

    # Get the numeric status code from the status
    # line.
    code = header_fields.pop(0).split(' ')[1]
    header = {}
    for field in header_fields:
        key, value = field.split(':', 1)
        header[key.lower()] = value
    return header, code


def send_request(host=HOST, path=PATH, port=PORT):
    """
    Send an HTTP GET request.
    """

    # Create the socket object.
    """
    A network socket is an internal endpoint 
    for sending or receiving data within a node on 
    a computer network.

    Concretely, it is a representation of this 
    endpoint in networking software (protocol stack), 
    such as an entry in a table 
    (listing communication protocol, 
    destination, status, etc.), and is a form of 
    system resource.

    The term socket is analogous to physical 
    female connectors, communication between two 
    nodes through a channel being visualized as a 
    cable with two male connectors plugging into 
    sockets at each node. 

    Similarly, the term port (another term for a female connector) 
    is used for external endpoints at a node, 
    and the term socket is also used for an 
    internal endpoint of local inter-process 
    communication (IPC) (not over a network). 
    However, the analogy is limited, as network 
    communication need not be one-to-one or 
    have a dedicated communication channel.
    """
    sock = socket.socket()
    # Connect to the server.
    sock.connect((host, port))
    # Send the request.
    sock.send(request_header(host, path))

    # Get the response.
    response = ''
    chuncks = sock.recv(4096)
    while chuncks:
        response += chuncks
        chuncks = sock.recv(4096)

    # HTTP headers will be separated from the body by an empty line
    header, _, body = response.partition(CRLF + CRLF)
    header, code = parse_header(header)
    return header, code, body


header, code, body  = send_request(host='www.google.com')
print code, CRLF, body
Ricky Wilson
  • 3,187
  • 4
  • 24
  • 29
0

For a working example to guide you, you might want to take a look at libcurl, a library written in the C language that:

  1. does what you want and much more;

  2. is a snap to use;

  3. is widely deployed; and

  4. is actively supported.

It's a beautiful thing and one of the best examples of what open source can and should be.

Pete Wilson
  • 8,610
  • 6
  • 39
  • 51