3

When I have a byte array in a DTO and convert it to json using jackson's ObjectMapper, it automatically converts the byte array into base64 string. Example below.

@Data
@AllArgsConstructor
class TestDTO {
    private byte[] binaryFile;
}

class TestByteSerialization {
    public static void main(String[] args) throws Exception {
        ObjectMapper objectMapper = new ObjectMapper();
        byte[] bytes = Files.readAllBytes(new File("path/to/file/test.pdf").toPath());

        TestDTO dto = new TestDTO(bytes);

        String json = objectMapper.writeValueAsString(dto);
        System.out.println(json);
    }
}

I expected jackson to convert it to an array of integers like the following:

{
    "binaryFile" : [21, 45, 12, 65, 12 ,37, etc]    
}

But instead, I found it to be converted to base64 string.

{
    "binaryFile" : "ZXhhbXBsZSB0ZXh0IG9ubHkuIEJpbmFyeSBmaWxlIHdhcyBkaWZmZXJlbnQgTE9MLg=="    
}

After researching a bit, It seems json does not support byte array as mentioned here. This makes sense because, json is a string representation of data.

But I still could not find the answer for why does json not support byte array? It still is just an array of numbers right? What is the need of converting that to base64 encoded string? What is wrong in passing byte array as is to the json String as an array of numbers?

For those marking it an opinion based question:

Developers definitely wouldn't have thought "Passing bytes as an array of numbers is boring. Let's try some crazy looking encoded string". There has to be some rationale behind this.

Arun Gowda
  • 2,721
  • 5
  • 29
  • 50
  • 2
    There may be many reasons but one I can think of is size. As a number each byte would require 1-3 characters + the comma and when deserializing it Jackson would at least need 4 bytes for each int until those could be converted to a byte array. Thus you'd need 2-4x more memory whereas with Base64 you'd just need about 1.33x as much memory. – Thomas Apr 19 '21 at 06:28
  • 1
    I don't think seeking technical explanation for understanding why things are made to work the way they are is an opinion based question. – Arun Gowda Apr 19 '21 at 06:42
  • JSON supports byte arrays just fine (ok, technically an array of integers), so the premise is wrong; it just generally doesn't make sense to send byte arrays this way. – Mark Rotteveel Apr 19 '21 at 08:41
  • Come on guys. This is not an opinion based question as explained in my edit. Please vote for reopening the same. – Arun Gowda Jul 27 '21 at 04:39

1 Answers1

17

What is wrong in passing byte array as is to the json String as an array of numbers?

Nothing, if you're happy with each byte of input taking (on average, assuming even distribution of bytes) 3.57 characters. That's assuming you don't have a space after each comma - otherwise it's 4.57 characters.

So compare these data sizes with 10K of data:

  • Raw: 10240 bytes (can't be represented directly in JSON)
  • Base64: 13656 characters
  • Array of numbers: 36556 characters

The size increase of 33% for base64 is painful enough... the size increase of using an array is much, much worse. So the convention is to use base64 instead. (It's only a convention - it's not like it's baked into the JSON spec. But it's followed by most JSON encoders and decoders.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • one follow up question. For transferring a single binary file, Is there any reason why json should be preferred over sending the byte resource as `application/octet-stream`? – Arun Gowda Apr 19 '21 at 11:03
  • 2
    @ArunGowda: That's a completely different question - very much *not* what Stack Overflow comments are for. – Jon Skeet Apr 19 '21 at 11:05