Java - é becomes Ã© - How to fix it

Question

I have a folder tree in French. While I'm reading it's folders/files, it returns Ã© instead of é. I replace the character, but it is not a good solution. How can I fix this ? I found some answers on google, but it doesn't help me.

Thanks!

You could start off by posting the code you're using. Chances are you're just reading using the default character encoding when it should probably be UTF-8, but we can't tell without seeing your code. — Jon Skeet, Apr 25 '13 at 07:12
Also note operating system and default locale set in your system. For instance in Windows with russian locale/language set, all filenames are encoded in `Cp866` encoding. I think for other languages there is another non-unicode encoding used in Windows. — , Apr 25 '13 at 07:18
I'm accessing child folders by giving root folder path. Root path name is in English. I'm just using `new File(rootPath)' no special thing. If a folder/file contains é ,java unable to locate it. — user2172625, Apr 25 '13 at 07:19
This looks like a UTF-8 byte sequence decoded using a legacy encoding (e.g. windows-1252 or ISO-8859-15.) Ensure the JRE's default encoding matches the system default encoding. — McDowell, Apr 25 '13 at 08:00
What does `System.getProperty("file.encoding")` and `Charset.defaultCharset()` return? — Afriza N. Arief, Apr 25 '13 at 09:10

score 4 · Answer 1 · edited May 23 '17 at 12:24

4

when starting the application, set the encoding to utf-8:

java -Dfile.encoding="UTF-8" YourMainClass

Note that as mentioned in the link above, many Java classes cache the encoding; therefore if you change the encoding during run-time, it may not affect all of the classes that we are concerned.

Copying explanation from tchrist in his answer to another question:

A \N{LATIN SMALL LETTER E WITH ACUTE} character is code point U+00E9. In UTF-8, that is \xC3\xA9.

But if you turn around and treat those two bytes as distinct code points U+00C3 and U+00A9, those are \N{LATIN CAPITAL LETTER A WITH TILDE} and \N{COPYRIGHT SIGN}, respectively.

edited May 23 '17 at 12:24

Community

1
1

answered Apr 25 '13 at 09:18

Afriza N. Arief

7,696
5
47
74

It probably makes more sense to set the encoding explicitly somewhere in the code instead of globally in the VM. Note that the `file.encoding` property doesn't even control all of the default encodings in the Java SE standard library. – Harold R. Eason May 10 '13 at 19:30

score 2 · Answer 2 · answered Apr 25 '13 at 07:17

You are facing an encoding problem.

Any string is actually a set of bits. To make them readable, we use mappings of groups of bits to a character representation we can read. Those 'maps' represent what is called an encoding.

The problem you are having is because you reading bits encoded using one 'map' and displaying it using another 'map'.

Be sure to use the same encoding and always check if your string manipulation functions work with the encoding being used. It is fundamental for proper working of your application.

score 1 · Answer 3 · edited Jan 20 '17 at 08:21

1

I have used below code to print é java unicode to file is working

writer1 = new FileWriter(outputFile, true);
writer2 = new BufferedWriter(writer1);
String str = new String(stringBuffer.toString().getBytes(), **"ISO-8859-1"**);
writer2.write(str);
writer1.flush();
writer2.flush();

edited Jan 20 '17 at 08:21

Rav

1,327
3
18
32

answered Jan 20 '17 at 07:35

karthikeyan paneerselvam

111
1
3

score 0 · Answer 4 · answered Apr 25 '13 at 07:16

0

This typically) happens when you're not decoding the text in the right encoding format (probably UTF-8).

If you want a more precise answer, post us your code so we can try to correct it.

answered Apr 25 '13 at 07:16

Padrus

2,013
1
24
37

score 0 · Answer 5 · answered Apr 27 '13 at 15:20

The code is displaying the right bits — what is wrong is that the thing you are using to look at those bits has been told that the bits are in a different encoding than they actually are.

This is not a Java problem. This is a problem with whatever software you are using to look at the Java output. For example, your Terminal encoding might be set to ISO-8859-15 rather than the UTF-8 that Java is emitting.

It really helps to have an all–UTF-8 workflow for the external world, and an internal world of abstract Unicode code points.

I suppose it is possible that your are misreading some input, input that is in UTF-8 but which you are misreading as being in some legacy 8-bit encoding. But my best guess is the one already given, that your display device/program's encoding is mis-set.

Java - é becomes Ã© - How to fix it

5 Answers5

Linked