0

I have a Groovy script that takes user input from the CLI. The CLI supports Cyrillic characters and both the encoding and charset are in UTF-8. Yet, when Groovy reads input with Cyrillic characters, all it sees are "???????". Additionally, Groovy cannot create a directory or file with the given parameter. Does anyone have any ideas on forcing Groovy to accept the Cyrillic characters? Thanks.

1 Answers1

3

Ensure the reader you're using is using the same encoding as your CLI. If they are, it could be a problem displaying the characters instead. You can verify the Unicode codepoints that groovy is getting like this:

// test.groovy    
def input = System.in.withReader('UTF-8') { it.readLine() }

input.eachWithIndex { ch, index ->
    println "$ch, ${Character.codePointAt(input, index)}"
}

Run this from the CLI:

$ echo $LANG
en_US.UTF-8
$ echo Здра́вствуйте | groovy test.groovy
З, 1047
д, 1076
р, 1088
а, 1072
́, 769
в, 1074
с, 1089
т, 1090
в, 1074
у, 1091
й, 1081
т, 1090
е, 1077
ataylor
  • 64,891
  • 24
  • 161
  • 189
  • I was trying this `System.in.withReader("UTF-8") { println it.readLine(); }` And it was still outputting ?????? when given Cyrillic. I'm more concerned with properly creating a directory than outputting the correct text. – UrbanCoder Mar 14 '13 at 15:35
  • What OS? It's possible java using another encoding for filenames. Try adding `-Dfile.encoding=UTF-8` to the command line. – ataylor Mar 14 '13 at 15:48
  • This is on windows. And my encoding is set to UTF-8 in a system property using `JAVA_TOOL_OPTS=-Dfile.encoding=UTF-8` – UrbanCoder Mar 14 '13 at 16:06
  • +1, this problem was posted here sometime ago, and the problem was the encoding in the CLI of the op. – Will Mar 14 '13 at 16:35
  • The script does not output the correct code points. It outputs the correct code points for the english characters, but on the Cyrillic characters it prints `?: 63` For what it's worth, at the beginning of the script, I have `println "Encoding is " + System.properties['file.encoding'] println "charet is " + java.nio.charset.Charset.defaultCharset()` Which outputs `UTF-8` for both statements – UrbanCoder Mar 14 '13 at 17:25
  • Change to a UTF-8 code page in your CLI. See http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how – ataylor Mar 14 '13 at 17:51
  • That resulted in different behavior... Now something breaks when reading a line with Cyrillic characters. The program doesn't recognize that I actually put in any input and it won't print out anything it receives. However if I provide input without cyrillic characters, it's just fine. – UrbanCoder Mar 14 '13 at 18:17
  • It is also worth noting that attempting to print Cyrillic characters from groovy itself results in some errors. I had `println "Администратор"` and ended up with `Администратор[]тратортор[]р[]` where [] is one of those "I don't know what to do with this character" boxes. – UrbanCoder Mar 14 '13 at 21:38