2

LATEST SSCCEE

Why does example below output different strings?

package tests.java;

import java.io.IOException;
import java.io.OutputStream;
import java.io.PrintStream;
import java.nio.charset.Charset;
import java.util.Arrays;

public class Try_PrintWriterEncoding3 {

    public static void main(String[] args) {
        final PrintStream oldOut = System.out;

        System.out.println("Привет, мир!");

        System.setOut(new PrintStream(new OutputStream() {

            @Override
            public void write(int b) throws IOException {
                oldOut.print(new String(new byte[] {(byte)b}, Charset.defaultCharset())); 
            }
        }));

        System.out.println("Привет, мир!");

        System.setOut(new PrintStream(new OutputStream() {

            @Override
            public void write(int b) throws IOException {
                throw new UnsupportedOperationException(); 
            }

            @Override
            public void write(byte[] b, int off, int len) throws IOException {
                oldOut.print(new String(Arrays.copyOf(b, len), Charset.defaultCharset()));
            }

            @Override
            public void write(byte[] b) throws IOException {
                throw new UnsupportedOperationException(); 
            }
        }));

        System.out.println("Привет, мир!");
    }
}

PREVIOUS EXAMPLES

I would like to write custom stdout stream, but fails with international encoding.

It is told, that PrintStream converts characters to bytes according to default encoding. This could mean that to decode one should use default encoding too.

But it doesn't work.

Also any other possible encodings don't work.

package tests.java;

import java.io.IOException;
import java.io.OutputStream;
import java.io.PrintStream;

import java.nio.charset.Charset;

public class Try_PrintWriterEncoding {

    public static void main(String[] args) {

        final PrintStream oldOut = System.out;

        System.out.println("Привет, мир!");

        System.setOut(new PrintStream(new OutputStream() {

            @Override
            public void write(int b) throws IOException {
                oldOut.write(b); // works
            }
        }));

        System.out.println("Привет, мир!");

        System.setOut(new PrintStream(new OutputStream() {

            @Override
            public void write(int b) throws IOException {
                oldOut.print(new String(new char[] {(char)b})); // does not work (garbage type 1)
            }
        }));

        System.out.println("Привет, мир!");

        System.setOut(new PrintStream(new OutputStream() {

            @Override
            public void write(int b) throws IOException {
                oldOut.print(new String(new byte[] {(byte)b})); // does not work (garbage type 2)

            }
        }));

        System.out.println("Привет, мир!");

        System.setOut(new PrintStream(new OutputStream() {

            @Override
            public void write(int b) throws IOException {
                oldOut.print(new String(new byte[] {(byte)b}, Charset.defaultCharset())); // does not work (garbage type 2)
            }
        }));

        System.out.println("Привет, мир!");

        System.setOut(new PrintStream(new OutputStream() {

            @Override
            public void write(int b) throws IOException {
                oldOut.print(new String(new byte[] {(byte)b}, Charset.forName("UTF-8"))); // does not work (garbage type 2)
            }
        }));

        System.out.println("Привет, мир!");


        System.setOut(new PrintStream(new OutputStream() {

            @Override
            public void write(int b) throws IOException {
                oldOut.print(new String(new byte[] {(byte)b}, Charset.forName("CP866"))); // does not work (garbage type 3)
            }
        }));

        System.out.println("Привет, мир!");

        System.setOut(new PrintStream(new OutputStream() {

            @Override
            public void write(int b) throws IOException {
                oldOut.print(new String(new byte[] {(byte)b}, Charset.forName("Cp1251"))); // does not work (garbage type 4)
            }
        }));

        System.out.println("Привет, мир!");
    }
}

OUTPUT

enter image description here

Kenster
  • 23,465
  • 21
  • 80
  • 106
Suzan Cioc
  • 29,281
  • 63
  • 213
  • 385

1 Answers1

1

Change

 }));

into

 }), true, encoding);

Where true means flush on newlines, and encoding as desired, say "Windows-1251".

It will never work for the real console, as that is operating system defined.

Otherwise you have to fake a console, as IDEs do. Or ensure that the console (cmd.exe) is run under Unicode or so.


  System.setOut(new PrintStream(new OutputStream() {

        byte[] line = new byte[1024];
        int pos = 0;

        @Override
        public void write(int b) throws IOException {
            line[pos++] = (byte) b;
            if (pos >= line.length || b == '\n') {
                flush();
            }
        }

        @Override
        public void flush() throws IOException {
            oldOut.println(new String(line, 0, pos, ENCODING));
            oldOut.flush();
            pos = 0;
        }
    }), true, encoding);

First try not giving ENCODING, that defaults to the operating system's.

Community
  • 1
  • 1
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • I should not touch `PrintStream`, I should recover from it's bytes. – Suzan Cioc Mar 26 '15 at 16:11
  • The question is how to extract correct symbols from bytes, which were appeared in the result of default `PrintStream` working. – Suzan Cioc Mar 26 '15 at 16:13
  • So you capture the bytes in a file? Then you might open that file in a programmer's editor like NotePad++ or JEdit, and switch encodings. – Joop Eggen Mar 26 '15 at 16:17
  • I think I see UTF-16 (two byte per char); in every case byte-wise conversion in unrealistic. You would need to create a byte array for an input line, and convert that with `new String(bytes, encoding)`. – Joop Eggen Mar 26 '15 at 16:26
  • I wish to stay in Java only. – Suzan Cioc Mar 26 '15 at 16:26
  • Ensure that you are compiling in the same encoding as the editor uses. That looked to be the case, but try UTF-8. Use `\u041f` instead of `П`. In every case you need a byte array. – Joop Eggen Mar 26 '15 at 16:29
  • If editor was using different encoding, then plain call would also give garbaged result – Suzan Cioc Mar 26 '15 at 16:31
  • 1
    Yes I thought too. It looks like PrintStream does not use a single-byte encoding. – Joop Eggen Mar 26 '15 at 16:40