3

I need to calculate the length of base64 decoded data.

I have Base-64 data that I am sending the unencoded data as the body of a HTTP response (typo: I meant request, but same idea).

I need to send a Content-Length header.

In the interest of memory usage and performance I'd rather not actually Base-64 decode the data all at once, but rather stream it.

Given base64 data, how do I calculate the length of the decoded data will be? I need either a general algorithm, or a Java/Scala solution.


EDIT: This is similar to, but not a duplicate of Calculate actual data size from Base64 encoded string length, where the OP asks

...can I calculate the length of the raw data that has been encoded only by looking at the length of the Base64-encoded string?

The answer is no. It is necessary to look at the padding as well.

I want to know how the length and the base64 data can be used to calculate the original length.

Community
  • 1
  • 1
Paul Draper
  • 78,542
  • 46
  • 206
  • 285
  • possible duplicate of [Calculate actual data size from Base64 encoded string length](http://stackoverflow.com/questions/6816137/calculate-actual-data-size-from-base64-encoded-string-length) – frostmatthew Mar 17 '14 at 23:06
  • Sorry, but I am still confused. What are you writing to the response? Can't the `OutputStream` take care of counting the number of bytes you write? – Sotirios Delimanolis Mar 17 '14 at 23:14
  • 1
    "Given base64 data, how do I calculate the length of the decoded data will be?" Decode it and check the length. – developerwjk Mar 17 '14 at 23:15
  • @developerwjk, "In the interest of memory usage and performance I'd rather not actually Base-64 decode the data all at once, but rather stream it." – Paul Draper Mar 17 '14 at 23:44

2 Answers2

4

Assuming that you can't just use chunked encoding (and thereby avoid sending a Content-Length header), you need to consult the padding thus:

  • Base64 encodes three binary octets into four characters. You have 4N Base64 characters. Let k be the number of trailing '=' chars (i.e. padding chars: 0, 1 or 2).
  • Let M = 3*floor((N-k)/4), i.e. the number of octets in "complete" 3-octet chunks.
  • If you have 2 padding chars then you have M + 1 bytes.
  • If you have 1 padding char then you have M + 2 bytes.
  • If you have 0 padding chars then you have M bytes.

Of course, floor() in this case means truncating integer division, i.e. the normal / operator.

Presumably you can count padding octets relatively easily (e.g. by seeking to the end of a file, or by looking at the end of a byte array), without having to read the whole Base64-encoded thing sequentially.

user3392484
  • 1,929
  • 9
  • 9
  • Thanks. Chunked encoding is definately the best way. I am using `com.amazonaws.services.s3.transfer.TransferManager`: "If no content length is specified for the input stream, then TransferManager will attempt to buffer all the stream content upload. Because the entire stream contents must be buffered in memory, this can be very expensive, and should be avoided whenever possible." It's definitely a limitation of this Java library, not HTTP. – Paul Draper Mar 17 '14 at 23:48
  • It's almost right...`L` *must* be divisible by 4 (if `L` is the length of the Base-64 data). – Paul Draper Mar 18 '14 at 00:03
  • indeed - I've updated the answer to say what I wanted it to say, rather than what it actually said... – user3392484 Mar 18 '14 at 09:07
2

I arrived at this simple calculation.

If L is the length of the Base-64 encoded data, and p is the number of padding characters (which will be 0, 1, or 2), then the length of the unencoded data is

L * 3 / 4 - p

In my case (with Scala),

bytes.length * 3 / 4 - bytes.reverseIterator.takeWhile(_ == '=').length

NOTE: This is assuming the the data does not have line separators. (Often, Base-64 data will have new lines every 72 characters or so.) If it does, exclude line separators from the length L.

Paul Draper
  • 78,542
  • 46
  • 206
  • 285