2

I am trying to write a java utility that writes out an UTF-8 file with just the characters I explicity write to the file. I wrote the following code to do the trick.

import java.io.BufferedWriter;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;


public class FileGenerator {

    public static void main(String[] args) {
        try {

            char content = 0xb5;

            String filename = "SPTestOutputFile.txt";

            BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(
                        new FileOutputStream(filename), "UTF-8"));

            bw.write(content);
            bw.close();

            System.out.println("Done");

        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}

I also pass -Dfile.encoding=UTF-8 as a VM argument.

The character that I am trying to write does get written to the file but I also get a  before it so when I try to write out µ I actually get µ. Does anyone know how to correct this so that I always just get just µ?

Thanks

  • http://stackoverflow.com/questions/1001540/how-to-write-a-utf-8-file-with-java – Boren Mar 17 '15 at 20:50
  • 2
    What you got is in fact the proper UTF8 sequence for a single [micro sign](http://www.fileformat.info/info/unicode/char/00b5/index.htm) -- when read with a viewer that *does not interpret* UTF8 sequences. So please clarify your "I actually get ..." – Jongware Mar 17 '15 at 20:53
  • 2
    Looks good; check this table: http://www.utf8-zeichentabelle.de/ ... `µ` is `c2 b5` - exactly what is written to the file. – Trinimon Mar 17 '15 at 20:54

2 Answers2

3

The implementation works just fine: the UTF-8 representation for µ is c2 b5. That is exactly what is written to the file.

Check UTF-8 table here.

File in Hex editor

Trinimon
  • 13,839
  • 9
  • 44
  • 60
1

Your txt file contains two "symbols":

  1. BOM (Byte order mark)
  2. µ

If your application (some reader) recognizes encoding correctly, you see only µ. In other cases application interprets BOM as another symbol and you can see µ or something else.

So your text file is OK.

rumis
  • 197
  • 1
  • 6