-1

I have created a java code that will store an uploaded text document. Then I return the text in that file. All the text are in "sinhala" language. UTF-8 encoded text

        streamReader = new InputStreamReader(new FileInputStream(new File(filePath)), "utf8" /*Here I have tried 'UTF-8', 'utf-8'*/);
        br = new BufferedReader(streamReader);
        PrintStream printStream= new PrintStream(f);
        while ((line = br.readLine()) != null) {
            .....
        }

The output is directly sent to jsp page, there it's shown as '????????????????????????????????'.

Windows 8.1, tomcat and java version 7. I have tested jsp with sinhala characters, they are working. I have added UTF-8, as content type.

I have tried this one, this one, and this one too.

Community
  • 1
  • 1
bula
  • 8,719
  • 5
  • 27
  • 44
  • I would try setting UTF-8 for the output i.e. PrintStream as well. – Peter Lawrey Aug 18 '15 at 16:27
  • 1
    Unicode characters certainly *are* recognized in Java, but your Java program can read, manipulate, or output them incorrectly if you so choose. Also, whatever mechanism you are using to examine the result might do the wrong thing with your program's output. Your code fragment looks a bit suspicious, but you haven't given us a complete example to work with, so we can't say much. – John Bollinger Aug 18 '15 at 16:28
  • 1
    @PeterLawrey, `PrintStream`s and other `OutputStream`s don't *have* an encoding associated with them. That is what I find most suspicious, in fact. `OutputStream`s are for writing binary data; for character data one should use a `Writer`. – John Bollinger Aug 18 '15 at 16:32
  • You need to show us the JSP, and to show how the output is sent directly to it. Also make sure that the font used in your web page is able to show those "sinhala" characters. – JB Nizet Aug 18 '15 at 16:35
  • @JohnBollinger I was thinking PrintWriter so your comment helped me come up with an answer, – Peter Lawrey Aug 18 '15 at 16:37

3 Answers3

1

The JSP must provide the specified encoding as UTF-8 well as all the InputStream/Writer and OutputStream/Writers having the UTF-8 character set explicitly provided.

<%@ page contentType="text/html; charset=UTF-8" %>
0

To set the encoding for a Writer you can do

PrintWriter out = new PrintWriter(new InputStreamWriter(f, "UTF-8"));

You can use a PrintWriter instead or a PrintStream as it has the same methods.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 1
    and the `JSP/HTML` headers need to specify `UTF-8` as well for this to be a complete solution. `<%@ page contentType="text/html; charset=UTF-8" %>` –  Aug 18 '15 at 16:38
0

You need to ensure the correct encoding of the HTTP response.

If you insert the text in JSP, set the JSP encoding at the top of the .jsp file (see also UTF-8 encoding in JSP page):

<%@ page contentType="text/html; charset=UTF-8" %>
...
<c:out value="${myDocumentTextInUnicode}"/>

If generating the response in a servlet, set the encoding there:

response.setContentType("text/plain");
response.setCharacterEncoding("UTF-8");
PrintWriter out = response.getWriter();
while ((line = br.readLine()) != null) {
    out.println(line);
}
Community
  • 1
  • 1
Andreas
  • 154,647
  • 11
  • 152
  • 247