0

When I run the following code

import socket
import urlparse
import re
import os

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.connect(("www.google.co.in", 80))
s.send("GET /?gfe_rd=cr&gws_rd=cr HTTP/1.0\r\n\r\n")
data = s.recv(100000)
print data
s.close()

The response I get from google is always the following

HTTP/1.0 302 Found
Location: http://www.google.co.in/?gfe_rd=cr&gws_rd=cr
Cache-Control: private
Content-Type: text/html; charset=UTF-8
P3P: CP="This is not a P3P policy! See https://www.google.com/support/accounts/answer/151657?hl=en for more info."
Date: Mon, 04 Jan 2016 04:30:53 GMT
Server: gws
Content-Length: 245
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Set-Cookie: NID=75=chG9KySsUncl-1elqXhs56m7cNHxFvFwNR5pZoavIwRJ2PpoGlm5RbShdsiF7udrTgwZgG-eRo4oQqA0RhbfwtExcxUGk88F_R2TNV9vi4XKhWSB9ihhcqulYTtg9xGkagSDPdFfmw; expires=Tue, 05-Jul-2016 04:30:53 GMT; path=/; domain=.google.com; HttpOnly

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.co.in/?gfe_rd=cr&amp;gws_rd=cr">here</A>.
</BODY></HTML>

I understand this is because I did not follow the redirects. Could someone explain which url I should connect to so that I do not get this error, or how can I fix this problem?

Curious
  • 20,870
  • 8
  • 61
  • 146

1 Answers1

1

You are getting a 302 Found, not a Not Found.

What 302 response code means is a redirect, something which your browser would silently/automatically do, and load the new redirect page.

As you can see in the response body

The document has moved
<A HREF="http://www.google.co.in/?gfe_rd=cr&amp;gws_rd=cr">here</A>.

Point your request to this URL, and you should no longer receive a 302 response, and make sure you replace &amp; with &:

s.send("GET /?gfe_rd=cr&gws_rd=cr HTTP/1.0\r\n\r\n")

Take a look at this link to automatically unescape these URL's, so you don't need to do it manually.

For example, in Python 3.5:

import html
html.unescape('/?gfe_rd=cr&amp;gws_rd=cr')  # /?gfe_rd=cr&gws_rd=cr
Community
  • 1
  • 1
Martin Konecny
  • 57,827
  • 19
  • 139
  • 159
  • Even when I do that I cannot seem to get the correct response – Curious Jan 04 '16 at 04:57
  • What's the response now? I get a `HTTP/1.0 200 OK` with your fixed code. – Martin Konecny Jan 04 '16 at 05:00
  • I get a 200 ok response but I cannot seem to get this to work with other websites like `www.cplusplus.com` (in the connect parameter), the response code is 301 (saying moved permanently) – Curious Jan 04 '16 at 05:33
  • Is this an assignment? Is there any reason you aren't using `Requests` for this: http://docs.python-requests.org/en/latest/ It would be a simple `requests.get('http://www.google.co.in')` – Martin Konecny Jan 04 '16 at 05:45
  • Yep.. it is an assignment – Curious Jan 04 '16 at 05:48
  • I see - well a 301 is similar to a 302. One is a temporary redirect, and one is permanent. Make sure you are reading the `headers` that are returned right before the body. There will be one header named `Location` that tells you exactly where the redirect is to. – Martin Konecny Jan 04 '16 at 05:51
  • What was your original address? I see that `http://cplusplus.com/` 301 redirects to `http://www.cplusplus.com/` – Martin Konecny Jan 04 '16 at 05:55
  • I called the connect() function to "www.cplusplus.com" – Curious Jan 04 '16 at 05:57
  • Make sure your `GET /xxx` is correct as well. In this case it should be `/` – Martin Konecny Jan 04 '16 at 05:58
  • It is.. The top line is `GET / HTTP/1.1\n`. Could you get the request to work? – Curious Jan 04 '16 at 06:10
  • 2
    It seems you need to send the `Host` header as well for that website. `s.send("GET / HTTP/1.0\r\n") s.send("Host: www.cplusplus.com\r\n\r\n")` (note the double `\r\n\r\n` is on the last header only) – Martin Konecny Jan 04 '16 at 07:09