1

I am building a small application to turn the text in a text file to Base64 then back to normal. The decoded text always returns some Chinese characters in the beginning of the first line.

public EncryptionEngine(File appFile){
    this.appFile= appFile;
}


public void encrypt(){

    try {
        byte[] fileText = Files.readAllBytes(appFile.toPath());// get file text as bytes

        Base64.Encoder encoder = Base64.getEncoder();
        PrintWriter writer = new PrintWriter(appFile);

        writer.print("");//erase old, readable text
        writer.print(encoder.encodeToString(fileText));// insert encoded text
        writer.close();


    } catch (IOException e) {

        e.printStackTrace();
    }

}

public void deycrpt(){

    try {
        byte[] fileText = Files.readAllBytes(appFile.toPath());

        String s = new String (fileText, StandardCharsets.UTF_8);//String s = new String (fileText);


        Base64.Decoder decoder = Base64.getDecoder();
        byte[] decodedByteArray = decoder.decode(s);

        PrintWriter writer = new PrintWriter(appFile);
        writer.print("");
        writer.print(new String (decodedByteArray,StandardCharsets.UTF_8)); //writer.print(new String (decodedByteArray));
        writer.close();


    } catch (IOException e) {

        e.printStackTrace();
    }



}

Text FileBefore before encrypt():

cheese

tomatoes

potatoes

hams

yams

Text File after encrypt() //5jAGgAZQBlAHMAZQANAAoAdABvAG0AYQB0AG8AZQBzAA0ACgBwAG8AdABhAHQAbwBlAHMADQAKAGgAYQBtAHMADQAKAHkAYQBtAHMA

Text File After decrypt

뿯붿cheese

tomatoes

potatoes

hams

yams

Before encrypt() :

After decrypt() :

K.Milli
  • 81
  • 2
  • 10

2 Answers2

1

Your input file is UTF-16, not UTF-8. It begins with FF FE, the little-endian byte order mark. StandardCharsets.UTF_16 will handle this correctly. (Or instead, set your text editor to UTF-8 instead of UTF-16.)

When you decoded fffe as UTF-8, you got two replacement characters "��", one for each of the two bytes that was not valid in UTF-8. Then when you printed this out, each replacement character '�' was encoded as ef bf bd in UTF-8. Then you interpreted the result as UTF-16, taking them in groups of two, reading it as efbf bdef bfbd. The remainder of the file was UTF-16 the whole time, but the null bytes will safely round-trip.

(If the file were ascii text encoded as UTF-16 without a byte-order mark, you would not have noticed how broken this was!)

Josh Lee
  • 171,072
  • 38
  • 269
  • 275
1

Your encrypt and decrypt functions don't make the same assumptions. encrypt Base64-encodes any file and is just fine except for the variable names and comments that suggest that the file is a text file. It need not be.

decrypt reverses the Base64-encoded data back to bytes but then "overprocesses" by assuming that the bytes were text encoding with UTF-8 and decoding then and re-encoding them before writing them to the file. If the assumption was true, it would just be a NOP; It's clearly not true in your case and it mangles the data.

Perhaps you did that because you were trying to use a PrintWriter. In Java (and .NET), the multiple stream and file I/O classes are often confusing—expecially considering their decades-long evolution. Sometimes there is one that does exactly what you need but it could be hard to find; other times, there isn't. And, sometimes, a commonly used library like Apache Commons fills the gap.

So, just write the bytes to the file. There are lots of modern and historical options as explained in the answers to this direct question byte[] to file in Java. Here's one with Files.write:

Files.write(appFile.toPath(), decodedByteArray, StandardOpenOption.CREATE);

Note: While Base64 possibly would have been considered encryption (and cracked) a couple of hundred years ago, it's not intended for that purpose. It's a bit dangerous (and confusing) to call it as such.

Tom Blodget
  • 20,260
  • 3
  • 39
  • 72