15

From the other posts on stack overflow this should be working

import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)                 

s.connect(("www.cnn.com" , 80))
s.sendall("GET / HTTP/1.1\r\n")
print s.recv(4096)
s.close

but for some reason it just hangs (at recv) and never prints. I know that a request to www.cnn.com will chunk it's data but I should at least read something from recv, right?

p.s. I know this isn't the best way to do it and that there are library like httplib and urllib2 out there, but I can't use those for this project (it's for school). I have to use the socket library

james smith
  • 227
  • 1
  • 2
  • 7

6 Answers6

26

You forgot to send a blank line after your request line:

s.sendall("GET / HTTP/1.1\r\n\r\n")

Furthermore, HTTP 1.1 specifies you should add the Host header field as documented in the Host section in the HTTP 1.1 RFC.

s.sendall("GET / HTTP/1.1\r\nHost: www.cnn.com\r\n\r\n")
Takis
  • 726
  • 5
  • 11
  • Pardon my noobness, why is a Host header required if the socket connected to it? – Jay Jul 26 '22 at 15:32
  • 1
    To allow name based virtual hosting: When you connect to the hostname, the hostname is resolved (using DNS) to an IP-address, and the actual TCP-connection is to the IP-address. So, if you'd host several websites on the same server, using only one IP-address, the client would not know which website you were trying to connect to. By passing the "Host"-header the webserver knows which website you are trying to access. [Virtual Hosting](https://en.m.wikipedia.org/wiki/Virtual_hosting) – Takis Aug 17 '22 at 09:15
8

Your code is almost right, but you need to send 2 \r\n sequences to satisfy the HTTP protocol.

A valid GET request will look like this (note 2 lines):

GET / HTTP/1.1

So your code should be:

s.sendall('GET / HTTP/1.1\r\n\r\n')

Further to that, there are additional headers required for valid HTTP 1.1 requests, such as Host:. You need to add them to your request, something like this:

s.sendall('''GET / HTTP/1.1
Host: cnn.com

''')
mhawke
  • 84,695
  • 9
  • 117
  • 138
  • This does answer my questions and was first so I guess I'll make this correct. For others, see my own answer as well – james smith Dec 10 '15 at 01:25
  • @jamessmith: you should choose the best answer, not the first answer. Anyway, I think that Takis answered first :) – mhawke Dec 10 '15 at 01:42
6

Sorry to waste everyone's time. I just found this solution here on Stack Overflow (just took some rewording in my Google search to find)

import socket
request = b"GET / HTTP/1.1\nHost: www.cnn.com\n\n"
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("cnn.com", 80))
s.send(request)
result = s.recv(10000)
while (len(result) > 0):
    print(result)
    result = s.recv(10000)

And all of the answers were right as well about the ending \r\n\r\n however those returned 301 statuses. This solution seems to follow the redirect somehow? Anyways, this solutions worked for me

Community
  • 1
  • 1
james smith
  • 227
  • 1
  • 2
  • 7
  • 1
    That code gives a 302 response. It does not follow the redirect. Do you need to handle redirects for your school project? – mhawke Dec 10 '15 at 01:37
3

I am cleaning up the examples for Python 3. We need bytes/string conversion and we can also use automatic closing of the connection using with:

#!/usr/bin/env python3

import socket

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:

    s.connect(("example.com" , 80))
    s.sendall(b"GET / HTTP/1.1\r\nHost: example.com\r\nAccept: text/html\r\n\r\n")
    print(str(s.recv(4096), 'utf-8'))
Jan Bodnar
  • 10,969
  • 6
  • 68
  • 77
2

@james: you did a SlowLoris attack there without aware of it. I can't explain better than here, https://www.youtube.com/watch?v=XiFkyR35v2Y I assumed that you found the solution from all the above answers but I just answered to bring this to your knowledge. :)

sibi
  • 174
  • 2
  • 9
1

Try replace this line:

s.sendall("GET / HTTP/1.1\r\n")

with:

s.sendall("GET / HTTP/1.1\r\n\r\n")
                             ^^^^

Also, I think you need replace s.close with s.close() since it's a function.

Remi Guan
  • 21,506
  • 17
  • 64
  • 87