4

I am trying to read some data via the Console and write that on to a file. I am getting problems when the data from the console has umlaut characters. It prints out '?' instead of umlaut characters . Please find below my code for the same. Can someone please help me

       String cmd = "cmd /C si viewproject"+ cmdLine+" --recurse --fields=indent,name --project="+name;

        Process p = Runtime.getRuntime().exec(cmd);
        BufferedReader in = new BufferedReader(new InputStreamReader(
                p.getInputStream()));
        String line = null;

        File filename = new File(System.getProperty("java.io.tmpdir"),
                "Project" + ".tmp");

        OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(filename), Charset.forName("UTF-8").newEncoder());
        while ((line = in.readLine()) != null) {

            osw.write(line);
            osw.write("\n");
        }
        osw.close();
user1688404
  • 729
  • 3
  • 9
  • 19
  • 5
    Is the process definitely using UTF-8 to write its data? Also note that you're using the platform default character encoding when writing to the file - I'd suggest using UTF-8 (via OutputStreamWriter wrapping a FileOutputStream instead of FileWriter) – Jon Skeet Jan 16 '13 at 12:16
  • 1
    You seem to use Windows => it is unlikely that the console is using UTF-8. You can read [this post](http://stackoverflow.com/questions/13348811/get-list-of-processes-on-windows-in-a-charset-safe-way) to get a better understanding of how the console code pages work on Windows. – assylias Jan 16 '13 at 12:17
  • 1
    [This problem](http://stackoverflow.com/questions/3862320/failing-to-write-german-umlauts-aou-from-console-to-text-file-with-java) appears to be the same, you may find an answer there. – Matt Jan 16 '13 at 12:18
  • @JonSkeet: i have changed from FileWriter to OutputStreamWriter and wrapping a FileOutputStream. The active Codepage is the cmd is 850. Any pointers on how to proceed?? – user1688404 Jan 17 '13 at 11:04
  • @user1688404: Well we still don't know what encoding the process is trying to use when talking to you, and you haven't said which charset you're using with the OutputStreamWriter... or indeed how you're then reading the file. – Jon Skeet Jan 17 '13 at 11:15
  • @JonSkeet: how do i check what encoding the process is using? i have updated the code in the main Thread. – user1688404 Jan 17 '13 at 12:21
  • @user1688404: I honestly don't know, really. But one option would be to get rid of the reader/writer side entirely on the Java side - just dump the *binary* data from the input stream straight to disk, at least for analysis. – Jon Skeet Jan 17 '13 at 12:23

1 Answers1

2

Try starting cmd with cmd /U and reading the input as UTF-16LE.

See this question What encoding/code page is cmd.exe using? for more information.

Community
  • 1
  • 1
jontro
  • 10,241
  • 6
  • 46
  • 71
  • I checked the chcp and it returns Active Codepage: 850. i also started the cmd with cmd /U and then trie to display a file with umlaut characters in it. It still retuns some wierd characters – user1688404 Jan 16 '13 at 12:45