I have a folder tree in French. While I'm reading it's folders/files, it returns é instead of é. I replace the character, but it is not a good solution. How can I fix this ? I found some answers on google, but it doesn't help me.
Thanks!
I have a folder tree in French. While I'm reading it's folders/files, it returns é instead of é. I replace the character, but it is not a good solution. How can I fix this ? I found some answers on google, but it doesn't help me.
Thanks!
when starting the application, set the encoding to utf-8:
java -Dfile.encoding="UTF-8" YourMainClass
Note that as mentioned in the link above, many Java classes cache the encoding; therefore if you change the encoding during run-time, it may not affect all of the classes that we are concerned.
Copying explanation from tchrist in his answer to another question:
A
\N{LATIN SMALL LETTER E WITH ACUTE}
character is code pointU+00E9
. In UTF-8, that is\xC3\xA9
.But if you turn around and treat those two bytes as distinct code points
U+00C3
andU+00A9
, those are\N{LATIN CAPITAL LETTER A WITH TILDE}
and\N{COPYRIGHT SIGN}
, respectively.
You are facing an encoding problem.
Any string is actually a set of bits. To make them readable, we use mappings of groups of bits to a character representation we can read. Those 'maps' represent what is called an encoding.
The problem you are having is because you reading bits encoded using one 'map' and displaying it using another 'map'.
Be sure to use the same encoding and always check if your string manipulation functions work with the encoding being used. It is fundamental for proper working of your application.
I have used below code to print é
java unicode to file is working
writer1 = new FileWriter(outputFile, true);
writer2 = new BufferedWriter(writer1);
String str = new String(stringBuffer.toString().getBytes(), **"ISO-8859-1"**);
writer2.write(str);
writer1.flush();
writer2.flush();
This typically) happens when you're not decoding the text in the right encoding format (probably UTF-8).
If you want a more precise answer, post us your code so we can try to correct it.
The code is displaying the right bits — what is wrong is that the thing you are using to look at those bits has been told that the bits are in a different encoding than they actually are.
This is not a Java problem. This is a problem with whatever software you are using to look at the Java output. For example, your Terminal encoding might be set to ISO-8859-15 rather than the UTF-8 that Java is emitting.
It really helps to have an all–UTF-8 workflow for the external world, and an internal world of abstract Unicode code points.
I suppose it is possible that your are misreading some input, input that is in UTF-8 but which you are misreading as being in some legacy 8-bit encoding. But my best guess is the one already given, that your display device/program's encoding is mis-set.