2

I am using DatatypeConverter to convert my strings to byte arrays and vice-versa, however when going from a byte array back to a string it doesnt report the same value as initially given.

This is a minimal example that runs on ideone

/* package whatever; // don't place package name! */

import java.util.*;
import java.lang.*;
import java.io.*;
import javax.xml.bind.DatatypeConverter;
import java.math.BigInteger;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Random;

/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
    public static void main (String[] args) throws java.lang.Exception
    {
        byte[] b = new byte[20];
        new Random().nextBytes(b);

        String s = DatatypeConverter.printBase64Binary(b);
        byte[] newB = DatatypeConverter.parseBase64Binary(s);

        if(!Arrays.equals(b, newB))
            System.out.println(b + " should match " + newB);

        s = "Hello world";

        byte[] bytes = DatatypeConverter.parseBase64Binary(s);
        String newS = DatatypeConverter.printBase64Binary(bytes);
        byte[] newBytes = DatatypeConverter.parseBase64Binary(newS);

        if(!s.equals(newS))
            System.out.println(s + " should match " + newS);

        if(!Arrays.equals(bytes, newBytes))
            System.out.println(bytes + " should match " + newBytes);
    }
}

Which I expect to not print anything, both if statements should negate the positive match and thus not print yet It outputs:

Hello world should match Hellowor

I am having the same issue running this on my machine as part of unit tests in java 8

The weird thing is when I convert the non-matching strings back into bytes, these do match

masud.m
  • 145
  • 13
RichyHBM
  • 798
  • 1
  • 7
  • 18

2 Answers2

4

The strings don't match because they shouldn't.

The operation printBase64Binary turns an arbitrary byte stream into a sequence of printable ASCII characters. However, this sequence won't just contain any old collection of printable ASCII characters - if a string is a valid Base64 translation of some byte sequence then there are certain things you can say about it: among other things, it won't contain spaces and the output length will be a multiple of 4.

Let me say that explicitly again: not all Strings are valid Base64 representations.

The operation parseBase64Binary will try its best to interpret the string you give it as a Base64 string and give you back the byte stream that it came from. However, if you give it some string you just made up out of thin air, well, it'll try to interpret it as best it can.

So the end result is that this operation:

bytes -> printBase64Binary -> String -> parseBase64Binary -> bytes

is a fine round-trip operation that will always give you back the same array you started with, but this operation:

String -> parseBase64Binary -> bytes -> printBase64Binary -> String

will not give you back the original string for most strings. (Personally, I think it should throw an exception to indicate that you fed it malformed input, but I understand the design goals that led the java people to do something different)

Daniel Martin
  • 23,083
  • 6
  • 50
  • 70
1

I am translating string to byte array and back in my project and using String.getBytes(charset) and new String(byteArray, 0, byteArray.length, charset).

Haven't experienced any problems during translation.

DangeMask
  • 531
  • 4
  • 19