4

I need to convert text file to the String, which, finally, I should put as an input parameter (type InputStream) to IFile.create (Eclipse). Looking for the example or how to do that but still can not figure out...need your help!

just for testing, I did try to convert original text file to UTF-8 encoded with this code

FileInputStream fis = new FileInputStream(FilePath);
InputStreamReader isr = new InputStreamReader(fis);

Reader in = new BufferedReader(isr);
StringBuffer buffer = new StringBuffer();

int ch;
while ((ch = in.read()) > -1) {
    buffer.append((char)ch);
}
in.close();


FileOutputStream fos = new FileOutputStream(FilePath+".test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(buffer.toString());
out.close();

but even thought the final *.test.txt file has UTF-8 encoding, the characters inside are corrupted.

Steven R. Loomis
  • 4,228
  • 28
  • 39
JackBauer
  • 135
  • 1
  • 3
  • 10

1 Answers1

9

You need to specify the encoding of the InputStreamReader using the Charset parameter.

                                    // ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));

This also works:

InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));

See also:

SO search where I found all these links: https://stackoverflow.com/search?q=java+detect+encoding


You can get the default charset - which is comes from the system the JVM is running on - at runtime via Charset.defaultCharset().

Community
  • 1
  • 1
Matt Ball
  • 354,903
  • 100
  • 647
  • 710
  • Thank you for reply, but I'm getting Encoding from isr (isr.getEncoding()), doen't it already know what the encoding is? – JackBauer Dec 08 '10 at 02:35
  • Am I right that I have to do like: InputStreamReader isr1 = new InputStreamReader(fis); Charset inputCharset = Charset.forName(isr1.getEncoding()); InputStreamReader isr = new InputStreamReader(fis, inputCharset)); ? – JackBauer Dec 08 '10 at 02:36
  • @Jack: nope, that's not how it works. There's really no way to **know** the encoding of an arbitrary chunk of text. If you haven't specified the encoding of the `InputStreamReader`, then the reader will have (therefore `isr.getEncoding()` will return) the **default** encoding. – Matt Ball Dec 08 '10 at 02:39
  • I see, thank you! so, then, what is the best way to set Charset.forName("ISO-8859-1"); without hardcoding it? Assume the text file is created on the same PC. – JackBauer Dec 08 '10 at 02:53
  • @Jack: if the text file is created on the same PC, **using the default charset**, then there should be no need to pass in a non-default charset. – Matt Ball Dec 08 '10 at 02:59
  • 2
    @Jack: hey, I thought you said the file's encoding was known. What gives? :P – Matt Ball Dec 08 '10 at 03:09
  • Agree, stated, but did not understand exactly what did it mean, sorry about that. – JackBauer Dec 08 '10 at 06:37
  • 2
    http://www.joelonsoftware.com/articles/Unicode.html, recommend to read, first of all to myself! – JackBauer Dec 08 '10 at 06:43
  • @Jack: so what was the solution you arrived at? Hardcoding the charset? I am curious. – Matt Ball Dec 08 '10 at 14:10
  • @Matt: Probably will use icu4j to guess the encoding, since I really do not know what the encoding of the file is. – JackBauer Dec 08 '10 at 23:55