11

I download a file from a website using a Java program and the header looks like below

Content-Disposition attachment;filename="Textkürzung.asc";

There is no encoding specified

What I do is after downloading I pass the name of the file to another application for further processing. I use

System.out.println(filename);

In the standard out the string is printed as Textk³rzung.asc

How can I change the Standard Out to "UTF-8" in Java?

I tried to encode to "UTF-8" and the content is still the same

Update:

I was able to fix this without any code change. In the place where I call this my jar file from the other application, i did the following

java -DFile.Encoding=UTF-8 -jar ....

This seem to have fixed the issue

thank you all for your support

KK99
  • 1,971
  • 7
  • 31
  • 64
  • 1
    You need to read the input correctly. Then you just print the file. If you try to print a file that isn't UTF-8 to standard out you'll just get garbage again. – markspace Feb 17 '15 at 17:20
  • You can't "change the standard out to UTF-8" from the Java side, instead you need to work out what encoding standard out expects, then ensure that you use that encoding from Java when printing the string. – Ian Roberts Feb 17 '15 at 17:20
  • Set a breakpoint and inspect variable before printing...does it show up correctly there? If so, you may need to change your IDE settings to display UTF-8 properly in the console. – jlewkovich Feb 17 '15 at 17:20
  • 5
    For reference, ü in Unicode is U+00FC, and the byte 0xFC corresponds to ³ in Windows code page 850. – Ian Roberts Feb 17 '15 at 17:28
  • The important question is: How do you create the String `filename`? All Java Strings are in Unicode, so printing it should just work (unless `System.out` is improperly configured on your system). – Harald K Feb 17 '15 at 17:56

3 Answers3

9

The default encoding of System.out is the operating system default. On international versions of Windows this is usually the windows-1252 codepage. If you're running your code on the command line, that is also the encoding the terminal expects, so special characters are displayed correctly. But if you are running the code some other way, or sending the output to a file or another program, it might be expecting a different encoding. In your case, apparently, UTF-8.

You can actually change the encoding of System.out by replacing it:

try {
    System.setOut(new PrintStream(new FileOutputStream(FileDescriptor.out), true, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    throw new InternalError("VM does not support mandatory encoding UTF-8");
}

This works for cases where using a new PrintStream is not an option, for instance because the output is coming from library code which you cannot change, and where you have no control over system properties, or where changing the default encoding of all files is not appropriate.

Pepijn Schmitz
  • 2,143
  • 1
  • 17
  • 18
  • 2
    This looks like the correct answer to me. Encode to UTF-8 then output the encoded bytes straight to the FileDescriptor. The two other up voted solutions could experience intermittent issues due to the double encoding. – nicktalbot May 11 '18 at 09:59
7

The result you're seeing suggests your console expects text to be in Windows "code page 850" encoding - the character ü has Unicode code point U+00FC. The byte value 0xFC renders in Windows code page 850 as ³. So if you want the name to appear correctly on the console then you need to print it using the encoding "Cp850":

PrintWriter consoleOut = new PrintWriter(new OutputStreamWriter(System.out, "Cp850"));
consoleOut.println(filename);

Whether this is what your "other application" expects is a different question - the other app will only see the correct name if it is reading its standard input as Cp850 too.

Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • I am not sure if I understood you correctly. The other application (BPEL process) we have is calling a Java JAR and awaiting for the response (stdout). The (windows) server where these 2 applications are residing have code page 850. I am really not sure what else I should do to set the code page. Also PrintWriter writes to file and not to console – KK99 Feb 18 '15 at 11:41
  • 2
    @KarthikKrishnan if you're piping the output of this Java program into another process (the BPEL engine) then you can ignore what it looks like on the console when you run standalone. What matters is that the encoding you use to write things to stdout in the Java process is the same as the encoding which the BPEL engine uses to read it. If this is something you can configure on the BPEL side then configure both sides to use UTF-8 for maximum compatibility. If it isn't then you need to find out what encoding the BPEL expects and then make your Java program use the same. – Ian Roberts Feb 18 '15 at 15:36
  • I believe this has the same issue as proxysingleton's answer. It will probably work some of the time. You're wrapping one encoding with another encoding. After encoding to Cp850 the bytes needs to be passed straight to FileDescriptor.out. – nicktalbot May 11 '18 at 09:56
5

Try to use:

 PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println(test);
  • 1
    i get `Textk├╝rzung_.asc` – KK99 Feb 17 '15 at 17:25
  • 1
    @KarthikKrishnan This *writes* UTF-8 to the console. However the terminal only understands some variation of ANSI (probably [CP 437](http://en.wikipedia.org/wiki/Code_page_437)) and *not* UTF-8. Redirect the output to a file, then open the file in a UTF-8 aware editor and the "correct text" should be displayed. – user2864740 Feb 17 '15 at 17:28
  • What I have is a process reading this sysout . I get this there too. I want to fix the stream – KK99 Feb 17 '15 at 17:30
  • 1
    @KarthikKrishnan The *writing* is correct; the reading (or viewing) is not. Use `java theprogram > thefile.txt` and then open "thefile.txt" in wordpad.exe (not notepad!) to verify. – user2864740 Feb 17 '15 at 17:30
  • @user2864740 That works but I don't need to write to file to read. I need to feed this to another application to read – KK99 Feb 18 '15 at 11:07
  • 1
    @KarthikKrishnan The "other application" needs to *read it correctly* (as UTF-8) and then *display it correctly* (with the correct font, as Unicode correctly decoded from UTF-8). The example above with the file redirect shows that *Java is doing it's part correctly* - ***the "other application" is not reading the byte stream correctly in UTF-8*** (or not displaying it correctly as Unicode, eg. if the "other application" writes directly to the console). As previous stated command.com/cmd.exe is *not* UTF/Unicode-aware. With this approach the data *in the stdout stream* is UTF-8, as requested. – user2864740 Feb 18 '15 at 15:54
  • This looks wrong. You are wrapping a PrintStream that also contains an encoding. You need to replace System.out with FileDescriptor.out. See Pepijn's answer. – nicktalbot May 11 '18 at 09:53
  • // System.out.println("[전송 완료]"); // Instead of this statement, use below 2 lines. PrintStream out = new PrintStream(System.out, true, "UTF-8"); out.println("[전송 완료]"); – Park JongBum Aug 25 '21 at 13:38