8

I'm trying to get HTML by URL in Java. But 301 Moved Permanently is all that I've got. Another URLs work. What's wrong? This is my code:

 hh= new URL("http://hh.ru");
        in = new BufferedReader(
                new InputStreamReader(hh.openStream()));


        while ((inputLine = in.readLine()) != null) {

            sb.append(inputLine).append("\n");
            str=sb.toString();//returns 301


        }
Tony
  • 3,605
  • 14
  • 52
  • 84
  • if another url works,so nothings wrong with your code, is http://hh.ru a valid url? – Soosh Aug 25 '13 at 17:04
  • When I visit that URL I get a 301 redirect. Here's a link to code that follows redirects: http://stackoverflow.com/questions/1884230/java-doesnt-follow-redirect-in-urlconnection – dcaswell Aug 25 '13 at 17:05

6 Answers6

21

You're facing a redirect to other URL. It's quite normal and web site may have many reasons to redirect you. Just follow the redirect based on "Location" HTTP header like that:

URL hh= new URL("http://hh.ru");
URLConnection connection = hh.openConnection();
String redirect = connection.getHeaderField("Location");
if (redirect != null){
    connection = new URL(redirect).openConnection();
}
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String inputLine;
System.out.println();
while ((inputLine = in.readLine()) != null) {
    System.out.println(inputLine);
}

Your browser is following redirects automaticaly, but using URLConnection you should do it by your own. If it bothers you take a look at other Java HTTP client implementations, like Apache HTTP Client. Most of them are able to follow redirect automatically.

Valentin Michalak
  • 2,089
  • 1
  • 14
  • 27
Jk1
  • 11,233
  • 9
  • 54
  • 64
  • But this shows me HTML of mobile version. Look https://m.hh.ru. I want full version. – Tony Aug 25 '13 at 17:26
  • It looks like version depends on user agent you're setting. This one gives a full version in my test: connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2"); – Jk1 Aug 25 '13 at 17:34
  • How can I test it next time if it doesn't work? I mean which strategy should I use to fix these parameters? – Tony Aug 25 '13 at 17:46
  • If you want to fetch the same pages as browser does, then behave as similar to browser as possible: 1. Follow redirects and handle other HTTP codes 2. Set typical headers (User-agent, Accept, Referrer, etc) when making a request 3. Use cookies and maitain authentication, if necesary – Jk1 Aug 25 '13 at 17:53
  • @Jk1 Maybe it would be a good idea to look at the error code before considering that the header field "location" is a redirection, wouldn't it? connection.getResponseCode()? – gouessej Aug 10 '20 at 22:11
  • @gouessej the only legitimate use case for Location header other than 3XX is 201 Created, which is almost impossible to receive in response to the GET request. On the other hand, the response code check is always a good idea, whatever your client logic may be. – Jk1 Aug 13 '20 at 20:00
2

found this answer useful and improved a little due to the possibility of multiple redirections (e.g. 307 then 301).

URLConnection urlConnection = url.openConnection();
                String redirect = urlConnection.getHeaderField("Location");
                for (int i = 0; i < MAX_REDIRECTS ; i++) {
                    if (redirect != null) {
                        urlConnection = new URL(redirect).openConnection();
                        redirect = urlConnection.getHeaderField("Location");
                    } else {
                        break;
                    }
                }
Aviator
  • 512
  • 3
  • 7
1

There's nothing wrong with your code. The message means that hh.ru is permanently moved to another domain.

ProgramFOX
  • 6,131
  • 11
  • 45
  • 51
0

I tested your code and it is ok, but when I use "hh.ru", the same problem as yours, and when I use lynx(command line browser) to connect to "hh.ru", it will show me that it is redirecting to another url and then show me that it is moved permanently and after that this alert:
"Alert!: This client does not contain support for HTTPS URLs"

Soosh
  • 812
  • 1
  • 8
  • 24
0

I resolved mine when I put the specific file running on the server. Instead of http://hh.ru, I used http://hh.ru/index.php. It worked for me

Makyen
  • 31,849
  • 12
  • 86
  • 121
kimoduor
  • 504
  • 6
  • 16
0

Check if the URL provided is HTTP or HTTPS, consider adding protocol is you are using only domain name like http(s)://domainname.com/resource-name

Read: https://en.wikipedia.org/wiki/HTTP_301

Vishrant
  • 15,456
  • 11
  • 71
  • 120