0

I am having problems to write out the following string into a file correctly. Especially with the character "œ". The Problem appears on my local machine (Windows 7) and on the server (Linux)

String: "Cœurs d’artichauts grillées"

  1. Does Work (œ gets displays correctly, while the apostrophe get translated into a question mark):

    Files.write(path, content.getBytes(StandardCharsets.ISO_8859_1));
    
  2. Does not work (result in file):

    Files.write(path, content.getBytes(StandardCharsets.UTF_8));
    

According to the first answer of this question, UTF-8 should be able to encode the œ correctly as well. Has anyone have an idea what i am doing wrong?

Karol Dowbecki
  • 43,645
  • 9
  • 78
  • 111
onionknight
  • 101
  • 1
  • 10
  • If the first method "works", it seems you don't want UTF-8. Because that one writes ISO-8859-1. Are you sure that whatever you use to display the output really wants UTF-8? – Thilo May 15 '18 at 08:46
  • It is also possible that your String `content` is already broken. In case of a String literal, what is your Java source file encoding? It has to match what your editor thinks it should be. – Thilo May 15 '18 at 08:50
  • 1
    @Thilo Problem found. The string was broken beforehand. Broken String: "Curs d’artichauts grillées" How it should look like: "Cœurs d’artichauts grillées" EDIT: The broken String doesnt get shown here as well. – onionknight May 15 '18 at 12:29

1 Answers1

5

Your second approach works

String content = "Cœurs d’artichauts grillées";
Path path = Paths.get("out.txt");
Files.write(path, content.getBytes(Charset.forName("UTF-8")));

Is producing an out.txt file with:

Cœurs d’artichauts grillées

Most likely the editor you are using is not displaying the content correctly. You might have to force your editor to use the UTF-8 encoding and a font that displays œ and other UTF-8 characters. Notepad++ or IntelliJ IDEA work out of the box.

Karol Dowbecki
  • 43,645
  • 9
  • 78
  • 111
  • Don't forget about byte order mark (BOM) to help you text editor/viewer to auto-detect char set. See https://stackoverflow.com/questions/49520409/android-utf-8-encoding-not-working/49520531#49520531 – Victor Gubin May 15 '18 at 08:52
  • I am using those both tools, but i will check what you have said. thank you for the quick response. – onionknight May 15 '18 at 08:52
  • @onionknight you can always inspect the file bytes with a hex editor to see how the content was written. – Karol Dowbecki May 15 '18 at 08:54
  • The BOM would be added as `Files.write(path, ("\uFEFF" + content).getBytes(StandardCharsets.UTF_8));` – Joop Eggen May 15 '18 at 08:56
  • My mistake. Intellij made me think the original string was correct. I copied the string out of debug mode, which surprisingly worked. – onionknight May 15 '18 at 12:35