3

I have byte array, which put in InputStreamReader and do some manipulations with it.

Reader reader = new InputStreamReader(new ByteArrayInputStream(byteArr));

JVM has default cp1252 encoding, but file, which I translating to byte array has utf-8 encoding. Also this file has german umlauts. And when I put byte array in InputStreamReader, java decode umlauts to wrong symbols. For example ü represent as ü. I'm tried to put "UTF-8" and Charset.forName("UTF-8").newDecoder()); to InputStreamReader constructor, translate strings from reader to string with new encoding via new String(oldStr.getBytes("cp1252"), "UTF-8); but it's not helped. In debugger in reader variable I see StreamDecoder parameter, which has "decoder" with MS1252$Decoder value. Maybe It's solving of my problem, but I not understand, how I can fix it.

Evgeny Mironenko
  • 389
  • 3
  • 5
  • 26

2 Answers2

3

Try to use InputStreamReader(InputStream in, String charsetName) constructor and set charset by yourself.

Reader reader = new InputStreamReader(new ByteArrayInputStream(byteArr), "UTF-8");
Pavlo K.
  • 371
  • 1
  • 9
  • It seems you read or manipulate data in a wrong way. By the way cp1252 supports german language. Can you show code that read file and output result? – Pavlo K. Oct 14 '13 at 13:25
  • Maybe try to set default JVM encoding like this http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding – Pavlo K. Oct 14 '13 at 13:30
  • Maybe OpenCSV damage the file. When I'm get Reader I put it in OpenCSV parse method. For example file with only one word - Zurückziehung, but on output I have Zurückziehung – Evgeny Mironenko Oct 14 '13 at 13:38
  • with ANSI encoding I have no trouble, but I need to work with UTF-8. – Evgeny Mironenko Oct 14 '13 at 13:39
  • I think here is a solution of your problem http://stackoverflow.com/questions/1695699/parse-csv-files-that-contain-unicode-character-using-opencsv – Pavlo K. Oct 14 '13 at 13:42
  • It's not solution, I do same manipulations: I'm creating InputStreamReader with "UTF-8" encoding and put this in CSVReader. – Evgeny Mironenko Oct 14 '13 at 14:01
  • So maybe your .txt file is not in UTF-8 encoding? I can't find any other explanation. – Pavlo K. Oct 14 '13 at 14:18
  • Look like your file is in cp1252 and you try to read and manipulate it in UTF-8 – Pavlo K. Oct 14 '13 at 14:30
  • Notepad++ say, that it's UTF-8 file :) – Evgeny Mironenko Oct 14 '13 at 14:39
  • Lets take a look on bytes: ö -> C3B6 in UTF8. In windows-1252 C3 ->à B6 ->¶ so ö ->ö So you should find in your code where you still use cp1252 and damage data. Go with debugger and check data on each step and find where it happen – Pavlo K. Oct 14 '13 at 14:44
  • It was environment problem. I solve my problem with Charset. But thank you for help. – Evgeny Mironenko Oct 18 '13 at 11:48
1

I had exactly the same error and finally solved the issue by adding this to the JVM startup options :

-Dfile.encoding=UTF8
mcflyfr
  • 61
  • 5