HTTP Create correct request for an html file in python

Question

So i've been spending some time trying to learn how to write http request

my goal is to request the html of a web page parse and extract data from there

Im having trouble understanding how I can do this if i dont have the exact path to the file and all i have is the basic url like www.google.com

in a way im trying to do what urllib.request does but manually with socket programming in python

#Playing with Sockets

import socket

target_port=80
target_url ='www.google.com'

client=socket.socket(socket.AF_INET,socket.SOCK_STREAM)

client.connect((target_url,target_port))


request= "GET https://www.google.com HTTP/1.1\nHost:google.com\n\n"

message= request.encode()
client.send(message)

response=client.recv(4096)
print(response.decode())

It's not clear what you mean by "I don't have the exact path to the file". The URL *is* the exact path. — Daniel Roseman, Aug 05 '17 at 17:02

score 1 · Accepted Answer · answered Aug 05 '17 at 17:10

First of all, your HTTP request should use the new line separators \r\n (hex values 0x0D and 0x0A). You're only using \n (0x0A). Here's a good stackoverflow question on this.

Second, the path to the request file is relative to the host address. So when you call client.connect((target_url,target_port)) to connect to the host's HTTP server, it is ready to accept your request using a relative path.

Ultimately, your request should look like this

request= "GET /path/to/file.html HTTP/1.1\r\nHost:google.com\r\n\r\n"

You will probably need some additional headers in there as well.

Take a look here for more information. If that link doesn't take you to the correct section, I was talking particularly about the HTTP 1.1 Clients section. The Sample HTTP Exchange section is great also. Actually, you will probably find the whole page to be very useful.

HTTP Create correct request for an html file in python

1 Answers1