I'm working with some files that might be either UTF-8 or ANSI (Cp1252 specifically), and I need to load them, make some edits, and then output the file again with the original encoding. However, I haven't had any luck getting my program to output ANSI at all.
My code for loading the text is a simple Scanner
with a charsetName
specified
fileScanner = new Scanner(f, CHARACTER_SET);
My current code for writing the file is the following:
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), CHARACTER_SET));
writer.write(this.toString());
System.out.println("Writing " + name + " (" + method + ") using " + CHARACTER_SET + " encoding");
writer.close();
CHARACTER_SET
is a String that is either "UTF8" or "windows-1252" depending on which encoding I detected the file to be when loading it.
The file actually outputs just fine in either mode, with all the special accent characters I've encountered being uncorrupted. The problem is that if I work on an Cp1252 file, it will output it as UTF-8 even though I initialized the BufferedWriter with a Cp1252 OutputStreamWriter. I can verify this since the encoding was set via CHARACTER_SET
, and I print out CHARACTER_SET
right afterwards showing that for ANSI files it used Cp1252. I'm checking the encoding of the output by loading it in Notepad++ and seeing what it says in the bottom right.
It know seems like I'm splitting hairs a little, but I really do want to leave the file with its original encoding.