0

I wrote this code to manually make a GET request using only python sockets. It worked perfectly fine back in 2016 when I wrote it but now I need it again and I keep getting the error code 400 bad request. I tried switching python version but it's still the same. I have been looking through Stackoverflow questions, asking more or less the same thing I do, but I just can't get it to work. I would appreciate if anyone could help me out. Here is my code, I removed all the IO and only posted the networking code.

URL_PATTERN = re.compile("^(.*://)?([A-Za-z0-9\-\.]+)(:[0-9]+)?(.*)$")
HEADER_END = re.compile("\r\n\r\n")

URL_DATA = re.match(URL_PATTERN, INPUT_URL)
PROTOCOL = URL_DATA.groups()[0][:-3]
HOSTNAME = URL_DATA.groups()[1]
PATHNAME = URL_DATA.groups()[3] if URL_DATA.groups()[3] != "" else "/"
PORT = 80 if PROTOCOL == "http" else 443
BUFFER_SIZE = 4096

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOSTNAME, PORT))

s.send("GET " + PATHNAME + " HTTP/1.1\r\nHost: " + HOSTNAME + "\r\nConnection: close\r\n\r\n")

resp = s.recv(BUFFER_SIZE)
HEADER_INDEX = re.search(HEADER_END, resp).start()
HTTP_RESPONSE_HEADER = resp[:HEADER_INDEX]

s.close()

When I run my program on the URL https://doc.rust-lang.org/book/2018-edition/foreword.html

The variables from my program has the values:

PORT: 443

PROTOCOL: https

HOSTNAME: doc.rust-lang.org

PATHNAME: /book/2018-edition/foreword.html

And then I get the 400 bad request code back. I don't understand what I'm doing wrong and would appreciate any help I can get.

VictorVH
  • 327
  • 1
  • 4
  • 14
  • To bring the answer of olin000 more to the point: the problem is that you are doing a plain HTTP request for a `https://` URL. Simply switching the port is not enough, you have to actually speak HTTP over TLS instead of plain HTTP. See the answer of olin000 for how you could implement this. – Steffen Ullrich Dec 08 '19 at 04:59

1 Answers1

2

I believe it's all about SSL. For reference you can check this question Python socket server handle HTTPS request.

I suggest you use:

context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)

and create a secure socket:

s_sock = context.wrap_socket(s, server_hostname=HOSTNAME)
s_sock.connect((HOSTNAME, PORT))

Additionally you might need to encode the message.

At the end your code could look like:

import re
import socket
import ssl

URL_PATTERN = re.compile("^(.*://)?([A-Za-z0-9\-\.]+)(:[0-9]+)?(.*)$")
HEADER_END = re.compile("\r\n\r\n")

INPUT_URL = "https://doc.rust-lang.org/book/2018-edition/foreword.html"

context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)

URL_DATA = re.match(URL_PATTERN, INPUT_URL)
PROTOCOL = URL_DATA.groups()[0][:-3]
HOSTNAME = URL_DATA.groups()[1]
PATHNAME = URL_DATA.groups()[3] if URL_DATA.groups()[3] != "" else "/"
PORT = 80 if PROTOCOL == "http" else 443
BUFFER_SIZE = 4096

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s_sock = context.wrap_socket(s, server_hostname=HOSTNAME)
s_sock.connect((HOSTNAME, PORT))

message = "GET " + PATHNAME + " HTTP/1.1\r\nHost: " + HOSTNAME + "\r\nConnection: close\r\n\r\n"
s_sock.send(message.encode('utf-8'))

resp = bytearray()
while True:
    part = s_sock.recv(BUFFER_SIZE)
    if not part:
        break
    resp += part

s_sock.close()

resp_string = str(resp, 'utf-8')
HEADER_INDEX = re.search(HEADER_END, resp_string).start()
HTTP_RESPONSE_HEADER = resp_string[:HEADER_INDEX]
olin000
  • 836
  • 1
  • 7
  • 10