0

So I'm trying to download html page from URL;

public static void getHtml(){
    URL url;
    InputStream is = null;
    BufferedReader br;
    String line;

    try {
        url = new URL(URL);
        is = url.openStream();  
        br = new BufferedReader(new InputStreamReader(is));

        while ((line = br.readLine()) != null) {
            System.out.println(line);
        }

    }catch(Exception e){

    } finally {
        try {
            if (is != null) is.close();
        } catch (IOException ioe) {

        }
    }
}

The thing is, it gives me not the HTML I want, but the following:

<html>
 <head>
  <title>loading</title>
 </head>
 <body>
  <p>Please wait...</p>
       <script>document.cookie="a=3c5hb1488cb3eghv3r456t12234jfyr7g;path=/;";location.href=document.location.pathname;</script>
 </body>
</html>

How do I download the content of the webpage directly? I also tried jsoup but it gave the same result. Also tried Apache - same.

Valera
  • 423
  • 1
  • 6
  • 16

1 Answers1

1

Here my guess about the website.

  1. It return this page to first-time visitor
  2. The browser set a cookie, and refresh (redirected to the same url)
  3. With the cookie, the server respond true content

So it works with browser but not java.

You may parse the set cookie script and replay it. "a=3c5hb1488cb3eghv3r456t12234jfyr7g;path=/;"

Refer to the following post for setting cookies on url connect URLConnection with Cookies?

Or use Apache HTTP Client http://hc.apache.org/httpclient-3.x/

Community
  • 1
  • 1