197

I have some current code and the problem is its creating a 1252 codepage file, i want to force it to create a UTF-8 file

Can anyone help me with this code, as i say it currently works... but i need to force the save on utf.. can i pass a parameter or something??

this is what i have, any help really appreciated

var out = new java.io.FileWriter( new java.io.File( path )),
        text = new java.lang.String( src || "" );
    out.write( text, 0, text.length() );
    out.flush();
    out.close();
Adam Paynter
  • 46,244
  • 33
  • 149
  • 164
mark smith
  • 20,637
  • 47
  • 135
  • 187

10 Answers10

225

Instead of using FileWriter, create a FileOutputStream. You can then wrap this in an OutputStreamWriter, which allows you to pass an encoding in the constructor. Then you can write your data to that inside a try-with-resources Statement:

try (OutputStreamWriter writer =
             new OutputStreamWriter(new FileOutputStream(PROPERTIES_FILE), StandardCharsets.UTF_8))
    // do stuff
}
Neuron
  • 5,141
  • 5
  • 38
  • 59
skaffman
  • 398,947
  • 96
  • 818
  • 769
  • 133
    ... and curse at Sun not putting in a constructor to FileWriter which takes a Charset. – Jon Skeet Jun 16 '09 at 13:42
  • 4
    It does seem like an odd oversight. And they still haven't fixed it. – skaffman Jun 16 '09 at 13:45
  • 4
    @Jon Skeet: Given that FileWriter is a wrapper for FileOutputStream that assumes the default encoding and buffer size, wouldn't that defeat the point? – Powerlord Jun 16 '09 at 13:47
  • Sorry, I meant for OutputStreamWriter, not for FileOutputStream. – Powerlord Jun 16 '09 at 13:49
  • I recommed to separate every declaration for types that implements the Closeable interface, especially if you use try with resources, like "new FileOutputStream"; is a good practice and avoid future errors like "IOException: Too many open files". – Luis Carlos Jul 15 '21 at 11:09
208

Try this

Writer out = new BufferedWriter(new OutputStreamWriter(
    new FileOutputStream("outfilename"), "UTF-8"));
try {
    out.write(aString);
} finally {
    out.close();
}
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Markus Lausberg
  • 12,177
  • 6
  • 40
  • 66
32

Try using FileUtils.write from Apache Commons.

You should be able to do something like:

File f = new File("output.txt"); 
FileUtils.writeStringToFile(f, document.outerHtml(), "UTF-8");

This will create the file if it does not exist.

Neuron
  • 5,141
  • 5
  • 38
  • 59
A_M
  • 7,693
  • 6
  • 33
  • 37
  • 5
    This also produces a file UTF-8 WIthout BOM ... I don't know if it's relevant or not. – neverMind Oct 01 '13 at 03:20
  • 3
    @Smarty only if you are already using Apache Commons. Otherwise it seems an awful waste to include yet another jar just because you don't want to write a few more characters. – Jason Jan 10 '14 at 01:03
  • I couldn't see a 'write(..)' method in FileUtils class. I checked in the commons IO 1.4 – RRM May 12 '14 at 06:23
  • If you read the Java docs on the link shown in the question, then it tells you the version of the Commons IO API where the write APIs were introduced. It looks like the write APIs were introduced from v2.0 onwards. – A_M May 13 '14 at 08:15
  • Just would like to mention that I used the method FileUtils.writeStringToFile(...) (with commons-io-1.3.1.jar) instead of FileUtils.write(...). – Léa Massiot Jul 19 '14 at 21:31
  • This is the best answer. If for example you want to read from a file with a different encoding, maybe ISO-8859-15 you can also read that file with that encoding using FileUtils.readFileToString(input, "ISO-8859-15") and finally copy that in UTF8 with FileUtils.writeStringToFile. – Jesus Mar 28 '15 at 15:15
23

Since Java 7 you can do the same with Files.newBufferedWriter a little more succinctly:

Path logFile = Paths.get("/tmp/example.txt");
try (BufferedWriter writer = Files.newBufferedWriter(logFile, StandardCharsets.UTF_8)) {
    writer.write("Hello World!");
    // ...
}
Neuron
  • 5,141
  • 5
  • 38
  • 59
Nigel_V_Thomas
  • 907
  • 13
  • 27
20

All of the answers given here wont work since java's UTF-8 writing is bugged.

http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html

Emperorlou
  • 800
  • 7
  • 11
  • As far as I can tell, the bug is this one (since the author of that article doesn't bother to mention it): http://bugs.sun.com/view_bug.do?bug_id=4508058 – Chris Jun 27 '12 at 19:16
  • 4
    The only issue when writing is the missing BOM. No big deal. Reading a file with a BOM on the other hand requires stripping it manually. – Axel Fontaine May 03 '13 at 13:18
  • 2
    UTF-8 doesn't need BOM, so technically the written file is still a valid UTF-8 encoded text file. The bug is with reading an UTF-8 with BOM. – Kien Truong Apr 08 '14 at 14:31
  • 1
    @Chris the bugs.sun.com link is broken. Do you have one that works? – Matthias May 12 '14 at 17:46
  • Still works for me; I'm not logged in or anything. Try googling for bug 4508058. – Chris May 15 '14 at 20:17
9
var out = new java.io.PrintWriter(new java.io.File(path), "UTF-8");
text = new java.lang.String( src || "" );
out.print(text);
out.flush();
out.close();
boxofrats
  • 882
  • 6
  • 8
9

Java NIO

As part of Java NIO, the Java 7 Files utility type is useful for working with files:

import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.io.IOException;
import java.util.*;

public class WriteReadUtf8 {
  public static void main(String[] args) throws IOException {
    List<String> lines = Arrays.asList("These", "are", "lines");

    Path textFile = Paths.get("foo.txt");
    Files.write(textFile, lines, StandardCharsets.UTF_8);

    List<String> read = Files.readAllLines(textFile, StandardCharsets.UTF_8);

    System.out.println(lines.equals(read));
  }
}

The Java 8 version allows you to omit the Charset argument - the methods default to UTF-8.

Files.write(textFile, lines);
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
McDowell
  • 107,573
  • 31
  • 204
  • 267
4

we can write the UTF-8 encoded file with java using use PrintWriter to write UTF-8 encoded xml

Or Click here

PrintWriter out1 = new PrintWriter(new File("C:\\abc.xml"), "UTF-8");
4

Below sample code can read file line by line and write new file in UTF-8 format. Also, i am explicitly specifying Cp1252 encoding.

    public static void main(String args[]) throws IOException {

    BufferedReader br = new BufferedReader(new InputStreamReader(
            new FileInputStream("c:\\filenonUTF.txt"),
            "Cp1252"));
    String line;

    Writer out = new BufferedWriter(
            new OutputStreamWriter(new FileOutputStream(
                    "c:\\fileUTF.txt"), "UTF-8"));

    try {

        while ((line = br.readLine()) != null) {

            out.write(line);
            out.write("\n");

        }

    } finally {

        br.close();
        out.close();

    }
}
Ammad
  • 4,031
  • 12
  • 39
  • 62
0

Here is an example of writing UTF-8 characters in the Eclipse IDE and to a File.

For Eclipse.simply set the Encoding to UTF-8 from Run -> Run Configurations -> Common Common Dialog

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStreamWriter;

public class UTF_8_Example {

    /**
     * Example of printing UTF-8 characters inside Eclipse IDE and a File.
     * <p>
     * For eclipse, you must go to Run ->  Run Configurations -> Common 
     * and set Encoding to UTF-8.
     * <p>
     * @param args
     */
    public static void main(String[] args) {
        BufferedWriter writer = null;

        try {
            ///////////////////////////////////////////////////////////////////
            // WRITE UTF-8 WITHIN ECLIPSE EDITOR
            ///////////////////////////////////////////////////////////////////         
            char character = '►';
            int code = character;
            char hex = '\u25ba';
            String value = "[" + Integer.toHexString(code) + "][\u25ba][" + character + "][" + (char)code + "][" + hex + "]";
            System.out.println(value);

            ///////////////////////////////////////////////////////////////////
            // WRITE UTF-8 TO A FILE
            ///////////////////////////////////////////////////////////////////
            File file = new File("UTF_8_EXAMPLE.TXT");
            FileOutputStream fileOutputStream = new FileOutputStream(file);
            OutputStreamWriter outputStreamWriter = new OutputStreamWriter(fileOutputStream, "UTF-8");
            writer = new BufferedWriter(outputStreamWriter);
            writer.write(value);
        }
        catch(Throwable e) {
            throw new RuntimeException(e);
        }
        finally {
            try {
                if(writer != null) { writer.close(); }
            }
            catch(Throwable e) {
                throw new RuntimeException(e);              
            }
        }
    }   
}