4

I am trying to read files using FileReader and write them into a separate file.
These files are UTF-8 encoded, but unfortuantely some of them still contain a BOM.
The relevant code I tried is this:

private final String UTF8_BOM = "\uFEFF";

 private String removeUTF8BOM(String s)
    {
        if (s.startsWith(UTF8_BOM))
        {
            s=s.replace(UTF8_BOM, "");
        }
        return s;
    }

    line=removeUTF8BOM(line);

But for some reason the BOM is not removed. Is there any other way I can do this with FileReader? I know that there is the BOMInputStream that should work, but I'd rather find a solution using FileReader.

Afzaal Ahmad Zeeshan
  • 15,669
  • 12
  • 55
  • 103

2 Answers2

5

The class FileReader is an old utility class, that uses the platform encoding. On Windows that is likely not UTF-8.

Best to read with another class.

As amusement, and to clarify the error, here a dirty hack, that works for platforms with single byte encodings:

private final String UTF8_BOM = new String("\uFEFF".getBytes(StandardCharsets.UTF_8));

This gets the UTF-8 bytes and makes a String in the current platform encoding.

No need to mention that FileReader is non-portible, dealing only with local files.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
2

Naive Solution to the question as asked:

public static void main(final String[] args)
{
    final String hasbom = "\uFEFF" + "Hello World!";
    final String nobom = hasbom.charAt(0) == '\uFEFF' ? hasbom.substring(1) : hasbom;
    System.out.println(hasbom.equals(nobom));
}

Outputs:

false

Proper Solution Approach:

You should never program to a File based API and instead program against InputStream/OutputStream so that your code is portable to different source locations.

This is just an untested example of how you might go about encapsulating this behavior into an InputStream to make it transparent.

public class BomProofInputStream extends InputStream
{
    private final InputStream is;

    public BomProofInputStream(@Nonnull final InputStream is)
    {
        this.is = is;
    }

    private boolean isFirstByte = true;

    @Override
    public int read() throws IOException
    {
        if (this.isFirstByte)
        {
            this.isFirstByte = false;
            final int b = is.read();
            if ("\uFEFF".charAt(0) != b) { return b; } 
        }
        return is.read();
    }
}

Found an full fledged example with some searching:

Community
  • 1
  • 1