0

I am writing a program to output a website's HTML code. I have tested it on some sites such as https://www.stackoverflow.com and it works. However, when I tried running the program with https://www.science.energy.gov, it doesn't work and throws an IOException. If I change the https to http and run it with http://www.science.energy.gov, the program runs but does not print anything. I am not sure why the HTML code for the http website is not displaying.

Below is the relevant code for the HTML extraction program:

import java.net.*;
import java.io.*;

public class URLReader {
   public static void main(String[] args) throws Exception {

      URL url;
      InputStream is = null;
      DataInputStream dis;
      String line;

      try {
         url = new URL("https://science.energy.gov/");
         is = url.openStream();  // throws an IOException
         dis = new DataInputStream(new BufferedInputStream(is));

         while ((line = dis.readLine()) != null) {
            System.out.println(line);
         }
      } catch (MalformedURLException mue) {
         mue.printStackTrace();
      } catch (IOException ioe) {
         ioe.printStackTrace();
      } finally {
         try {
            is.close();
         } catch (IOException ioe) {
            // nothing to see here
         }
      }
   }
}
coder
  • 101

1 Answers1

0

That's because when you send a request in http for http://science.energy.gov/ it redirects automatically to https, which means the site will reload. And your program is not capable of handling redirect requests. So it just stops. No output no error.

Now about the SSLHandshakeException. The error explains it self, unable to find valid certification path to requested target. Which means your java keystore doesn't have ssl certificate for service you are trying to connect. So you need to obtain the public certificate from the server you're trying to connect to. Read this answer for more information.


Also read,

Roshana Pitigala
  • 8,437
  • 8
  • 49
  • 80
  • Thank you for the quick response, but do you know how I can get rid of the IOException from the `https://science.energy.gov/`? – coder Apr 15 '18 at 07:22