2

I have a problem in getting Hebrew characters from a http get request.

I'm getting squares characters like this: "[]" instead of the Hebrew characters.

The English characters are Ok.

This is my function:

public String executeHttpGet(String urlString) throws Exception {
    BufferedReader in = null;
    try {
        HttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet();
        request.setURI(new URI(urlString));
        HttpResponse response = client.execute(request);
        in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(),"UTF-8"));
        StringBuffer sb = new StringBuffer("");
        String line = "";
        String NL = System.getProperty("line.separator");
        while ((line = in.readLine()) != null) {
            sb.append(line + NL);
        }
        in.close();
        String page = sb.toString();
        // System.out.println(page);
        return page;
    } finally {
        if (in != null) {
            try {
                in.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

You can test is by this example url:

String str = executeHttpGet("http://kavim-t.co.il/include/getXMLStations.asp?parent=7_%20_1");

Thank you!

David
  • 37,109
  • 32
  • 120
  • 141
  • How are you displaying the received text? Are you sure it isn't just the console output which doesn't have those characters in its font? – Graham Borland Feb 24 '12 at 12:46
  • I'm using textView to display received text. But I can see the problem even before when I'm in debug mode. – David Feb 24 '12 at 12:50
  • That also would be my assumption. Try to save the website to a file and then to display it with your browser. If that works out, you're fine. – devsnd Feb 24 '12 at 12:51

4 Answers4

5

The file you linked to doesn't seem to be UTF-8. I tested that it opens correctly using WINDOWS-1255 (hebrew encoding), you should try that instead of UTF-8.

Lycha
  • 9,937
  • 3
  • 38
  • 43
  • Wow man, you are right! , my mistake. 10x a lot. can you please tell my how did you find that this is 'WINDOWS-1255' ? – David Feb 24 '12 at 13:00
  • 1
    @David - it's in the response header: `Content-Type:text/xml; Charset=windows-1255` – McDowell Feb 24 '12 at 13:05
  • Using this code: `Header[] header = response.getAllHeaders();` I can see more details but I can't see `Content-Type:text/xml; Charset=windows-1255` – David Feb 24 '12 at 13:20
  • 1
    It is part of the http response header. You can read it by using the web-developer tools of your browser (assuming you're using firefox or chrome). – devsnd Feb 24 '12 at 13:24
  • @David I saved the file and used gedit text editor on Linux to open it. It allows me to try out different encodings. You can also use Chrome Developer Tools to see the encoding (in Chrome on your page press F12 and go to Network tab, then refresh the page and you see more details). – Lycha Feb 24 '12 at 14:51
0

Try a different website, it looks like it doesn't use UTF-8. Alternatively, UTF-16 may work but I haven't tried. Your code looks fine.

Dororo
  • 3,420
  • 2
  • 30
  • 46
0

As others have pointed out, the content is not actually encoded as UTF-8. You might want to look at httpEntity.getContentType() to extract the actual encoding of the content, and then pass this to your InputStreamReader. This means your code will then be able to cope correctly with any encoding.

Graham Borland
  • 60,055
  • 21
  • 138
  • 179
-1

hi as is posted in this other question Special characters in PHP / MySQL

you can set the characters on the php file on the example they set utf-8, but you can set a different type that supports the chararcters you need.

Community
  • 1
  • 1
Pedro Teran
  • 1,200
  • 3
  • 17
  • 43
  • I already set it to UTF-8 as you can see in my code and it didn't help. In addition I'm using java not PHP. – David Feb 24 '12 at 12:53