1

Good day. Have just switched from objective-c to java and trying to read url contents normally to string. Read tons of posts and still it gives garbage.

public class TableMain {

    /**
     * @param args
     */
    @SuppressWarnings("deprecation")
    public static void main(String[] args) throws Exception {
        URL url = null;
        URLConnection urlConn = null;

        try {
            url = new URL("http://svo.aero/timetable/today/");
        } catch (MalformedURLException err) {
            err.printStackTrace();
        }
        try {
            urlConn = url.openConnection();
        } catch (IOException e) {
            e.printStackTrace();
        }

        try {
            BufferedReader input = new BufferedReader(new InputStreamReader(
                    urlConn.getInputStream(), "UTF-8"));
            StringBuilder strB = new StringBuilder();
            String str;
            while (null != (str = input.readLine())) {
                strB.append(str).append("\r\n");
                System.out.println(str);
            }
            input.close();
        } catch (IOException err) {
            err.printStackTrace();
        }
    }
}

What's wrong? I get something like this

??y??'??)j1???-?q?E?|V??,??< 9??d?Bw(?э?n?v?)i?x?????Z????q?MM3~??????G??љ??l?U3"Y?]????zxxDx????t^???5???j?‌​?k??u?q?j6?^t???????W??????????~?????????o6/?|?8??{???O????0?M>Z{srs??K???XV??4Z‌​??'??n/??^??4????w+?????e???????[?{/??,??WO???????????.?.?x???????^?rax??]?xb??‌​& ??8;?????}???h????H5????v?e?0?????-?????g?vN

Pshemo
  • 122,468
  • 25
  • 185
  • 269
Morckovka
  • 103
  • 13
  • 1
    Please post the actual error/output you're getting. – Brian Sep 25 '12 at 21:51
  • Look at the responses to [this question](http://stackoverflow.com/questions/5769717/how-can-i-get-an-http-response-body-as-a-string-in-java). [This one](http://stackoverflow.com/a/5769756/646634) and [this one](http://stackoverflow.com/a/5769991/646634) are especially useful to what you're trying to do. – Brian Sep 25 '12 at 22:04
  • Just to rule it out, is your default Charset set to UTF-8 too? You can check with a `System.out.println(Charset.defaultCharset());` – MrLore Sep 25 '12 at 22:08
  • `@SuppressWarnings("deprecation")`, should you be using deprecated method? – gigadot Sep 25 '12 at 22:09
  • Sorry for very very stupid question from where can import IOUtils - met this topic before still could find it. Tried import org.apache.commons.io.IOUtils.*; – Morckovka Sep 25 '12 at 22:11
  • @Morckovka From the Apache Commons website: http://commons.apache.org/io/download_io.cgi – MrLore Sep 25 '12 at 22:18
  • default char set is x-MacCyrillic – Morckovka Sep 25 '12 at 22:20
  • @Morckovka Then you need to print using UTF-8, see: http://stackoverflow.com/questions/10143998/cyrillic-in-windows-consolejava-system-out-println – MrLore Sep 25 '12 at 22:23
  • @MrLore It's an article about Windows - well it seems i'll have to write it to a file or switch to windows( – Morckovka Sep 25 '12 at 22:29
  • @Morckovka The first answer's solution should be platform indipendent – MrLore Sep 25 '12 at 22:30
  • @Morckovka you can write the contents of the url to a file to see my code works correctly. To the console of your editor would be a bit hard unless you specify its encoding as well. – Nikola Yovchev Sep 26 '12 at 08:10
  • Well used the method that brian pointed at - if you want to change the format of output data in terminal click on your java file go properties->text file encoding change from default to what you need – Morckovka Sep 26 '12 at 09:12

2 Answers2

-1

Here is a method using HttpClient:

 public HttpResponse getResponse(String url) throws IOException {
    httpClient.getParams().setParameter("http.protocol.content-charset", "UTF-8");
    return httpClient.execute(new HttpGet(url));
}


public String getSource(String url) throws IOException {
            StringBuilder sb = new StringBuilder();
            HttpResponse response = getResponse(url);
            if (response.getEntity() == null) {
                throw new IOException("Response entity not set");
            }
            BufferedReader contentReader = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));

            String line = contentReader.readLine();

            while ( line != null ){
                sb.append(line)
                  .append(NEW_LINE);
                line = contentReader.readLine();
            }
            return sb.toString();
    }

Edit: I edited the response to ensure it uses utf-8.

Nikola Yovchev
  • 9,498
  • 4
  • 46
  • 72
  • emm sorry but i don't see where you input the encoding? – Morckovka Sep 25 '12 at 21:55
  • for utf-8 encoding the call to readLine will do. You can use other readers which allow you to specify the encoding, but in your question you did not specify anything about encoding. – Nikola Yovchev Sep 25 '12 at 22:00
  • @baba The title specifies UTF-8. – Brian Sep 25 '12 at 22:01
  • Anywho, then you can use DefaultHttpCLient. http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/client/DefaultHttpClient.html. As you see, it uses DEFAULT_CONTENT_CHARSET if not configured, but you can configure it. – Nikola Yovchev Sep 25 '12 at 22:03
-1

This is a result of:

  1. You are fetching data that is UTF-8 encoded
  2. You are didn't specify, but I surmise you are printing it to the console on a Windows system

The data is being received and stored correctly, but when you print it the destination is incapable of rendering the Russian text. You will not be able to just "print" the text to stdout unless the ultimate display handler is capable of rendering the characters involved.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
  • I am using Mac OS. How can i make it print right? The default charset is x-MacCyrillic – Morckovka Sep 25 '12 at 22:18
  • Print _where_? In a terminal window? It may be possible to set the default encoding for the terminal, but I don't work on a Mac so I don't know if the terminal program is capable of displaying UTF-8, even if the encoding is set. – Jim Garrison Sep 25 '12 at 22:22