3

Not sure whether this is a programming problem. I began to suspect so... but then I ran the Java program (executable jar) in question in a Windows console instead of a Cygwin one... and it ran fine: output accents fine, accented input accepted fine. So what follows applies only to the Cygwin console.

I'm processing some French text. When accented characters are printed (System.out) a sort of "hashed box" is printed instead. I saw another question here about this but there was no solution or proper explanation given.

And when I enter accented characters these are read in incorrectly (Java System.in), e.g. "bénéfice" is then printed out (in the log which is handling encoding correctly) as "bénéfice".

What is puzzling (perhaps) is that I am able to type "bénéfice" in the console. The font Deja Vu Sans Mono is meant to handle Unicode well, as I understand it. So... might this be something to do with the Java System.in and System.out streams???

For the avoidance of doubt, this is Cygwin on a Windows platform (does anyone use Cygwin on a non-Windows OS?).

I have tried changing the "Locale" and Character set and Font, by going Options --> Text. Nothing changes these boxes. At the moment settings are the default ones:
Font: Deja Vu Sans Mono
Locale: en_GB
Character set: UTF-8

At the command prompt, when I go

$ locale

I get

LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=

Anyone know what I should do?

mike rodent
  • 14,126
  • 11
  • 103
  • 157
  • 1
    Cygwin is of course only for Windows. Please note that Java is not a cygwin application so it will likely ignore the cygwin locale setting. – matzeri Jan 30 '17 at 20:46
  • 1
    Thanks. So... in practice what might this mean in terms of bytes and encodings and a possible solution? Is there a way to get Java to send streams in a particular encoding? I confess, encoding has always baffled me utterly. – mike rodent Jan 30 '17 at 20:52
  • 1
    This is most likely just a mismatch in the charset used by your java-instance and the cygwin console. Might be worthwhile to check the character-encoding used by your java-instance. [This question](http://stackoverflow.com/questions/2415597/java-how-to-detect-and-change-encoding-of-system-console) might also be quite helpful in solving this problem. –  Jan 30 '17 at 20:54
  • 1
    what code did you use to read string from `System.in`? – ZhongYu Jan 30 '17 at 21:02
  • 1
    @Paul great! You've solved half the problem: `PrintStream out = new PrintStream(System.out, true, "UTF-8");`. Now I'm trying to work out how to get a `BufferedReader` do `readLine` with the right encoding... – mike rodent Jan 30 '17 at 21:05
  • @ZhongYu `BufferedReader br = new BufferedReader( new InputStreamReader(System.in) );`. Presumably this is not using UTF8. Do you know how I solve that? – mike rodent Jan 30 '17 at 21:06
  • UTF-8 for InputStreamReader – ZhongYu Jan 30 '17 at 21:08
  • You should post the entire code of reading from `in` then printing to `out`. – ZhongYu Jan 30 '17 at 21:13

1 Answers1

3

Thanks to Paul and Zhong Yu for the answers here.

To print to Cygwin do this sort of thing:

PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.print( outputString );

To read from Cygwin do this sort of thing:

BufferedReader br = new BufferedReader( new InputStreamReader(System.in, "UTF-8") );
String nextInputLine = br.readLine();

Slightly amazed that this question has not come up before re Cygwin.

mike rodent
  • 14,126
  • 11
  • 103
  • 157