0

I'm having a fairly large amount of data (>1MB) encoded in Base64.

I found a nice library which can help me dealing with this as fast as possible.

It's decode syntax is here, very basic, it needs an output buffer:

int base64_decode
    ( const char  *src
    , size_t       srclen
    , char        *out
    , size_t      *outlen
    , int          flags
    ) ;

Although I could have the output buffer from their sample:

char out[1*1024*1024];

But input size is not constant, and it somehow looks and feels bad to ask this much size at compile time. On the other hand, having a nice large buffer on stack shall give some speed advantages over data stored and accessed on heap (source).

But I thought of using a vector<char> instead. I could define it as

std::vector<char> out;

Then when I have the input size, I can resize it:

out.resize(input_size);

Resizing would initialize all its items to 0, which seems impractical and unnecessary to me as the base64_decode would also initialize those items in the next step.

Therefore, resize might be not the best, but calling reserve will not help either as it doesn't modify the vector's size (although it doesn't initialize the items either).

As I have no prior information on the data size, I either need to use some runtime-resizable buffer, or take a big guess and allocate a huge buffer.

As the library is able to decode really fast, I would like to use the fastest solution for the output buffer too, and both char array and vector seems inappropriate. What could be the my fastest option then?

Daniel
  • 2,318
  • 2
  • 22
  • 53
  • 3
    `resize` is probably going to be your fastest solution. It's pretty easy for the CPU to zero out a `char`. My advice, use `vector` and then profile. If the performance is good, then your done. If it's not, then you at least have data you can use to help chose a better solution. – NathanOliver Apr 29 '21 at 12:23
  • Doesn't `base64_decode` return the amount of bytes it was able to decode? Then you invoke it continuously until it returns 0, meaning all data has been decoded. Using a too-large buffer (>1MB) is bad for cpu cache. But it's also easy to predict Base64 decoded size (is ~input*3/4+4 basically). – rustyx Apr 29 '21 at 12:26
  • No, it doesn't return with decoded size, it's putting it into `outlen`. About size estimation yes, I it's not difficult once we have the input data's length. – Daniel Apr 29 '21 at 12:27
  • Your stack array is only going to consume memory for the duration of the function call you are placing it in. And it is zero overhead to allocate/deallocate and initialize. A `std::vector`is going to be fairly slow to allocate/deallocate. – Galik Apr 29 '21 at 12:28
  • So I should do `char buffer[input_size]` right in front of `base64_decode`? – Daniel Apr 29 '21 at 12:29
  • That's not legal `C++`. The size must be a compile time constant. You could make the buffer a reasonable size and do the conversion in chunks if necessary. – Galik Apr 29 '21 at 12:31
  • You can allocate uninitialized memory with `::operator new (input_size)` which is basically syntax sugar for `malloc`. But then writing to it for the first time will be much slower, because virtual memory pages are allocated on first access. It may be better to pre-allocate a fixed buffer at program startup instead and not worry about the zero-filling overhead. – rustyx Apr 29 '21 at 12:35
  • `char buffer[input_size]` is supported by GCC and clang, but isn't Standard C++. Still, 1MB is a fair bit to put on the stack - bit rude. BTW /- it's possible to have a `vector` of `struct X { X() { } x(std::byte b) : b_{b} { } std::byte b_; };` - then vector resizing doesn't zero out the memory (due to the no-op default constructor / since C++11), but it's less convenient to access the data afterwards. Still, the first thing I'd do is consider how the output will be used: if you can use the output in chunks, just decode a bit at a time into a smaller stack-allocated buffer, use, repeat. – Tony Delroy Apr 29 '21 at 12:44
  • Where is this data going after you have decoded it? – Galik Apr 29 '21 at 12:54
  • The Standard may require zero-initializing, but a reasonable compiler could spot the following write without an intervening read or mutex lock. Removing redundant writes is a pretty common optimization. – MSalters Apr 29 '21 at 13:13

1 Answers1

0

Stack allocations are indeed a lot faster than heap allocations, but (1) you might not have 1MB of stack and more importantly (2) decoding 1MB of Base-64 text takes more time than the allocation itself, even with a fast library.

Note that Base-64 decoding is exactly a 4:3 ration, if rounded up. Hence if you know srclen, then (srclen+3)/4*3 is the required outlen. The main reason why it's an out-parameter is for that rounding - the actual length might be one byte shorter.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • So to conclude you're saying I'm pretty much good to go with the vector? – Daniel Apr 29 '21 at 13:21
  • Yup. The main exception would be if you want to return the data as a `std::string`, in which case you'd want to decode directly into an object of that type. But that's still the same heap (aka `std::allocator`) as vector. – MSalters Apr 29 '21 at 13:24