0

My college assignment is to fetch a web page from any web server by URL using a TCP socket and HTTP GET request.

I am not getting an HTTP/1.0 200 OK response from any server.

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.URL;
import java.util.Scanner;
import java.net.*;
public class DCCN042 {

    public static void main(String[] args) {
            Scanner inpt = new Scanner(System.in);
                System.out.print("Enter URL: ");
                String url = inpt.next();
                TCPConnect(url); 
            }
   public static void TCPConnect(String url) {
        try {
            String hostname = new URL(url).getHost();
            System.out.println("Loading contents of Server: " + hostname);
            InetAddress ia = InetAddress.getByName(hostname);
            String ip = ia.getHostAddress();
            System.out.println(ip + " is IP Adress for  " + hostname);
            String path = new URL(url).getPath();
            System.out.println("Requested Path on the server: " + path);
            Socket socket = new Socket(ip, 80);
            // Create input and output streams to read from and write to the server
            PrintStream out = new PrintStream(socket.getOutputStream());
            BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
            // Follow the HTTP protocol of GET <path> HTTP/1.0 followed by an empty line
            if (hostname ! = url) {
                //Request Line
                out.println("GET " + path + " HTTP/1.1");
                out.println("Host: " + hostname);
                //Header Lines
                out.println("User-Agent: Java/13.0.2");
                out.println("Accept-Language: en-us");
                out.println("Accept: */*");
                out.println("Connection: keep-alive");
                out.println("Accept-Encoding: gzip, deflate, br");
                // Blank Line
                out.println();
            } else {
                //Request Line
                out.println("GET / HTTP/1.0");
                out.println("Host: " + hostname);
                //Header Lines
                out.println("User-Agent: Java/13.0.2");
                out.println("Accept-Language: en-us");
                out.println("Accept: */*");
                out.println("Connection: keep-alive");
                out.println("Accept-Encoding: gzip, deflate, br");
                // Blank Line
                out.println();
            }
            // Read data from the server until we finish reading the document
            String line = in.readLine();
            while (line != null) {
                System.out.println(line);
                line = in.readLine();
            }
            // Close our streams
            in.close();
            out.close();
            socket.close();
        } catch (Exception e) {
            System.out.println("Invalid URl");
            e.printStackTrace();
        }
    }
}

I create a TCP socket and pass the IP address that I receive from InetAddress.getHostAddress() and port 80 for the web server, and use getPath() and getHost() to separate the path and hostname from the URL, and use the same path and hostname in the HTTP GET request.

The response from the server:

Enter URL: https://stackoverflow.com/questions/33015868/java-simple-http-get-request-using-tcp-sockets
    Loading contents of Server: stackoverflow.com
    151.101.65.69 is IP Adress for  stackoverflow.com
    Requested Path on the server: /questions/33015868/java-simple-http-get-request-using-tcp-sockets
    HTTP/1.1 301 Moved Permanently
    cache-control: no-cache, no-store, must-revalidate
    location: https://stackoverflow.com/questions/33015868/java-simple-http-get-request-using-tcp-sockets
    x-request-guid: 5f2af765-40c2-49ca-b9a1-daa321373682
    feature-policy: microphone 'none'; speaker 'none'
    content-security-policy: upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com
    Accept-Ranges: bytes
    Transfer-Encoding: chunked
    Date: Mon, 27 Dec 2021 15:00:17 GMT
    Via: 1.1 varnish
    Connection: keep-alive
    X-Served-By: cache-qpg1263-QPG
    X-Cache: MISS
    X-Cache-Hits: 0
    X-Timer: S1640617217.166650,VS0,VE338
    Vary: Fastly-SSL
    X-DNS-Prefetch-Control: off
    Set-Cookie: prov=149aa0ef-a3a6-8001-17c1-128d6d4b7273; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
    
    0

My requirement is to get the HTML source code of this webpage, and an HTTP/1.0 200 OK response.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Obaid
  • 3
  • 3
  • I hope this helps https://docs.oracle.com/cd/B40099_02/books/SiebInstWIN/SiebInstCOM_InstSWSE19.html#:~:text=the%20Web%20Server.-,The%20default%20HTTP%20and%20HTTPS%20ports%20for%20the%20Web,port%2080%20and%20443%2C%20respectively. – geobreze Dec 27 '21 at 15:28
  • Also, HTTPS is not using plain sockets to communicate. So you should rather use `SSLSocket` for HTTPS or find a site that doesn't have HTTPS. – geobreze Dec 27 '21 at 15:51
  • @geobreze, I was not using SSL Socket and hitting 'https'. Thank you it worked. – Obaid Dec 27 '21 at 16:20

1 Answers1

0

This is happening because you are using a plain Socket with a hardcoded port 80. This means that, independently of using a http or https url in your input, you are requesting via the unsecure protocol http.

In this situation, the server is telling you, as Samuel L. Jackson would say "hey mf! you are trying to reach me through an f unsecure protocol, HTTP. Use a secure one mf, the f HTTPS.", and so, it responds with 301 (which just means "use this url, not the original one"), with the Location header pointing to the correct URL, the https one.

So apparently the 301 Location is the same URL, but it's not, because in your code you are hardcoding http, and the server response is redirecting to https.

To make your code work with https, instead of a plain Socket use this:

SSLSocketFactory factory = (SSLSocketFactory)SSLSocketFactory.getDefault();
SSLSocket socket = (SSLSocket)factory.createSocket(ia, 443);

Do note that I'm not using the ip, because for https you need that the certificate corresponds to the domain, if you use the IP you will get a CertificateExpiredException.

Now, whether to use Socket or SSLSocket is something that you will have to manage programatically depending on the user input.

gmanjon
  • 1,483
  • 1
  • 12
  • 16