31

I've been trying to figure out the openssl documentation for base64 decoding and encoding. I found some code snippets below

#include <openssl/sha.h>
#include <openssl/hmac.h>
#include <openssl/evp.h>
#include <openssl/bio.h>
#include <openssl/buffer.h>

char *base64(const unsigned char *input, int length)
{
  BIO *bmem, *b64;
  BUF_MEM *bptr;

  b64 = BIO_new(BIO_f_base64());
  bmem = BIO_new(BIO_s_mem());
  b64 = BIO_push(b64, bmem);
  BIO_write(b64, input, length);
  BIO_flush(b64);
  BIO_get_mem_ptr(b64, &bptr);

  char *buff = (char *)malloc(bptr->length);
  memcpy(buff, bptr->data, bptr->length-1);
  buff[bptr->length-1] = 0;

  BIO_free_all(b64);

  return buff;
}

char *decode64(unsigned char *input, int length)
{
  BIO *b64, *bmem;

  char *buffer = (char *)malloc(length);
  memset(buffer, 0, length);

  b64 = BIO_new(BIO_f_base64());
  bmem = BIO_new_mem_buf(input, length);
  bmem = BIO_push(b64, bmem);

  BIO_read(bmem, buffer, length);

  BIO_free_all(bmem);

  return buffer;
}

This only seems to work for single line strings such as "Start", the moment I introduce complex strings with newlines and spaces etc it fails horribly.

It doesn't even have to be openssl, a simple class or set of functions that do the same thing would be fine, theres a very complicated build process for the solution and I am trying to avoid having to go in there and make multiple changes. The only reason I went for openssl is because the solution is already compiled with the libraries.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Bernard
  • 995
  • 2
  • 9
  • 20
  • A pure OpenSSL answer: http://stackoverflow.com/a/33331627/795876 :) – fsenart Oct 25 '15 at 15:53
  • There is similar [question](http://stackoverflow.com/q/342409/5447906), it's for C, but there are C++ answers there: http://stackoverflow.com/a/34201175/5447906 – anton_rh Dec 10 '15 at 12:05

10 Answers10

48

Personally, I find the OpenSSL API to be so incredibly painful to use, I avoid it unless the cost of avoiding it is extremely high. I find it quite upsetting that it has become the standard API in the crypto world.

I was feeling bored, and I wrote you one in C++. This one should even handle the edge cases that can cause security problems, like, for example, encoding a string that results in integer overflow because it's too large.

I have done some unit testing on it, so it should work.

#include <string>
#include <cassert>
#include <limits>
#include <stdexcept>
#include <cctype>

static const char b64_table[65] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

static const char reverse_table[128] = {
   64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
   64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
   64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 62, 64, 64, 64, 63,
   52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 64, 64, 64, 64, 64, 64,
   64,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
   15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 64, 64, 64, 64, 64,
   64, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
   41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 64, 64, 64, 64, 64
};

::std::string base64_encode(const ::std::string &bindata)
{
   using ::std::string;
   using ::std::numeric_limits;

   if (bindata.size() > (numeric_limits<string::size_type>::max() / 4u) * 3u) {
      throw ::std::length_error("Converting too large a string to base64.");
   }

   const ::std::size_t binlen = bindata.size();
   // Use = signs so the end is properly padded.
   string retval((((binlen + 2) / 3) * 4), '=');
   ::std::size_t outpos = 0;
   int bits_collected = 0;
   unsigned int accumulator = 0;
   const string::const_iterator binend = bindata.end();

   for (string::const_iterator i = bindata.begin(); i != binend; ++i) {
      accumulator = (accumulator << 8) | (*i & 0xffu);
      bits_collected += 8;
      while (bits_collected >= 6) {
         bits_collected -= 6;
         retval[outpos++] = b64_table[(accumulator >> bits_collected) & 0x3fu];
      }
   }
   if (bits_collected > 0) { // Any trailing bits that are missing.
      assert(bits_collected < 6);
      accumulator <<= 6 - bits_collected;
      retval[outpos++] = b64_table[accumulator & 0x3fu];
   }
   assert(outpos >= (retval.size() - 2));
   assert(outpos <= retval.size());
   return retval;
}

::std::string base64_decode(const ::std::string &ascdata)
{
   using ::std::string;
   string retval;
   const string::const_iterator last = ascdata.end();
   int bits_collected = 0;
   unsigned int accumulator = 0;

   for (string::const_iterator i = ascdata.begin(); i != last; ++i) {
      const int c = *i;
      if (::std::isspace(c) || c == '=') {
         // Skip whitespace and padding. Be liberal in what you accept.
         continue;
      }
      if ((c > 127) || (c < 0) || (reverse_table[c] > 63)) {
         throw ::std::invalid_argument("This contains characters not legal in a base64 encoded string.");
      }
      accumulator = (accumulator << 6) | reverse_table[c];
      bits_collected += 6;
      if (bits_collected >= 8) {
         bits_collected -= 8;
         retval += static_cast<char>((accumulator >> bits_collected) & 0xffu);
      }
   }
   return retval;
}
Omnifarious
  • 54,333
  • 19
  • 131
  • 194
  • Just a noob quesion !! when we are returning retval, isn't it a stack variable and which should not be returned ?as it can be removed as soon as function goes out of scope ? – bana Oct 25 '13 at 18:14
  • @bana - the function returns a new std::string object, which is assigned to before the stack object is destroyed, so this is safe. – nevelis Dec 15 '13 at 18:32
  • 16
    @Omnifarious: -1 - even though this is a totally acceptable solution, the question asked how to do it in OpenSSL, so this answer is rather unrelated to the original question and didn't help me ;) You find it "upsetting" that OpenSSL is the industry standard - can you suggest an alternative (without doing it yourself)? The OpenSSL library makes extensive use of machine-specific assembly in many areas for better performance, and the API is oriented around stream-based processing; maybe that's why you feel "upset" about writing a block-based wrapper. Disclaimer: I am not an OpenSSL maintainer. – nevelis Dec 15 '13 at 18:38
  • @Omnifarious here is my pure OpenSSL answer, can you review plz. http://stackoverflow.com/a/33331627/795876 – fsenart Oct 25 '15 at 15:54
  • @nevelis - I will see about writing a version using the OpenSSL API that's C++ish. Unfortunately, the alternatives are piecemeal, and I don't know that any of them are better. And it's not the stream oriented nature of the API that bothers me exactly. Though, it kind of does. OpenSSL should not care at all about things like file descriptors or streams or anything. It should be able to be used over any transport and the caller should handle the details. It's more the excessive bureaucracy of the API that bothers me. So many steps for the smallest thing, for no apparent reason. – Omnifarious Aug 07 '17 at 19:30
  • A lot of the complexity in OpenSSL comes from several things, from the macro hell to the undocumented internals you need to use to do things that even .NET can do with 1 line of code, to trying to support every platform with as much reuse as possible, trying not to add new, untested code, but leverage what's there.. and you end up with garbage anyway. You're right about the alternatives, especially the ones trying to clean it up... You can polish a turd all you want, but it's still a turd. :) – nevelis Aug 08 '17 at 20:50
  • Or an iterator for converting characters to base64: https://codereview.stackexchange.com/questions/248339/base64-iterators – Martin York Sep 02 '20 at 18:18
26

Rather than using the BIO_ interface it's much easier to use the EVP_ interface. For instance:

#include <iostream>
#include <stdlib.h>
#include <openssl/evp.h>

char *base64(const unsigned char *input, int length) {
  const auto pl = 4*((length+2)/3);
  auto output = reinterpret_cast<char *>(calloc(pl+1, 1)); //+1 for the terminating null that EVP_EncodeBlock adds on
  const auto ol = EVP_EncodeBlock(reinterpret_cast<unsigned char *>(output), input, length);
  if (pl != ol) { std::cerr << "Whoops, encode predicted " << pl << " but we got " << ol << "\n"; }
  return output;
}

unsigned char *decode64(const char *input, int length) {
  const auto pl = 3*length/4;
  auto output = reinterpret_cast<unsigned char *>(calloc(pl+1, 1));
  const auto ol = EVP_DecodeBlock(output, reinterpret_cast<const unsigned char *>(input), length);
  if (pl != ol) { std::cerr << "Whoops, decode predicted " << pl << " but we got " << ol << "\n"; }
  return output;
}

The EVP functions include a streaming interface too, see the man page.

mtrw
  • 34,200
  • 7
  • 63
  • 71
  • So many examples doing this the hard way. Thank you for posting the `EVP_EncodeBlock` method! – Shibumi May 08 '20 at 14:40
  • Thank you! This is the best way to do it. Simple. – rodolk Jul 13 '20 at 23:32
  • 1
    (1) decode64() result is binary, and so requires a length -- e.g. could return std::pair. (2) Each call to calloc() requires a call to free(). (3) decode64()'s output length must be reduced in the presence of input padding. – badfd Jul 20 '21 at 14:16
10

Here is an example of OpenSSL base64 encode/decode I wrote:

Notice, I have some macros/classes in the code that I wrote, but none of them is important for the example. It is simply some C++ wrappers I wrote:

buffer base64::encode( const buffer& data )
{
    // bio is simply a class that wraps BIO* and it free the BIO in the destructor.

    bio b64(BIO_f_base64()); // create BIO to perform base64
    BIO_set_flags(b64,BIO_FLAGS_BASE64_NO_NL);

    bio mem(BIO_s_mem()); // create BIO that holds the result

    // chain base64 with mem, so writing to b64 will encode base64 and write to mem.
    BIO_push(b64, mem);

    // write data
    bool done = false;
    int res = 0;
    while(!done)
    {
        res = BIO_write(b64, data.data, (int)data.size);

        if(res <= 0) // if failed
        {
            if(BIO_should_retry(b64)){
                continue;
            }
            else // encoding failed
            {
                /* Handle Error!!! */
            }
        }
        else // success!
            done = true;
    }

    BIO_flush(b64);

    // get a pointer to mem's data
    char* dt;
    long len = BIO_get_mem_data(mem, &dt);

    // assign data to output
    std::string s(dt, len);

    return buffer(s.length()+sizeof(char), (byte*)s.c_str());
}
TCS
  • 5,790
  • 5
  • 54
  • 86
9

This works for me, and verified no memory leaks with valgrind.

#include <openssl/bio.h>
#include <openssl/evp.h>
#include <cstring>
#include <memory>
#include <string>
#include <vector>

#include <iostream>

namespace {
struct BIOFreeAll { void operator()(BIO* p) { BIO_free_all(p); } };
}

std::string Base64Encode(const std::vector<unsigned char>& binary)
{
    std::unique_ptr<BIO,BIOFreeAll> b64(BIO_new(BIO_f_base64()));
    BIO_set_flags(b64.get(), BIO_FLAGS_BASE64_NO_NL);
    BIO* sink = BIO_new(BIO_s_mem());
    BIO_push(b64.get(), sink);
    BIO_write(b64.get(), binary.data(), binary.size());
    BIO_flush(b64.get());
    const char* encoded;
    const long len = BIO_get_mem_data(sink, &encoded);
    return std::string(encoded, len);
}

// Assumes no newlines or extra characters in encoded string
std::vector<unsigned char> Base64Decode(const char* encoded)
{
    std::unique_ptr<BIO,BIOFreeAll> b64(BIO_new(BIO_f_base64()));
    BIO_set_flags(b64.get(), BIO_FLAGS_BASE64_NO_NL);
    BIO* source = BIO_new_mem_buf(encoded, -1); // read-only source
    BIO_push(b64.get(), source);
    const int maxlen = strlen(encoded) / 4 * 3 + 1;
    std::vector<unsigned char> decoded(maxlen);
    const int len = BIO_read(b64.get(), decoded.data(), maxlen);
    decoded.resize(len);
    return decoded;
}

int main()
{
    const char* msg = "hello";
    const std::vector<unsigned char> binary(msg, msg+strlen(msg));
    const std::string encoded = Base64Encode(binary);
    std::cout << "encoded = " << encoded << std::endl;
    const std::vector<unsigned char> decoded = Base64Decode(encoded.c_str());
    std::cout << "decoded = ";
    for (unsigned char c : decoded) std::cout << c;
    std::cout << std::endl;
    return 0;
}

Compile:

g++ -lcrypto main.cc

Output:

encoded = aGVsbG8=
decoded = hello
Matt
  • 20,108
  • 1
  • 57
  • 70
7

So many horrible C code examples with buffers and malloc(), what about using std::string properly on this C++ tagged question?

#include <openssl/bio.h>
#include <openssl/evp.h>
#include <openssl/buffer.h>
#include <string>

std::string base64_encode(const std::string& input)
{
    const auto base64_memory = BIO_new(BIO_s_mem());
    auto base64 = BIO_new(BIO_f_base64());
    base64 = BIO_push(base64, base64_memory);
    BIO_write(base64, input.c_str(), static_cast<int>(input.length()));
    BIO_flush(base64);
    BUF_MEM* buffer_memory{};
    BIO_get_mem_ptr(base64, &buffer_memory);
    auto base64_encoded = std::string(buffer_memory->data, buffer_memory->length - 1);
    BIO_free_all(base64);
    return base64_encoded;
}
BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185
6

I like mtrw's use of EVP.

Below is my "modern C++" take on his answer without manual memory allocation (calloc). It will take a std::string but it can easily be overloaded to use raw bytes for example.

#include <openssl/evp.h>

#include <memory>
#include <stdexcept>
#include <vector>


auto EncodeBase64(const std::string& to_encode) -> std::string {
  /// @sa https://www.openssl.org/docs/manmaster/man3/EVP_EncodeBlock.html

  const auto predicted_len = 4 * ((to_encode.length() + 2) / 3);  // predict output size

  const auto output_buffer{std::make_unique<char[]>(predicted_len + 1)};

  const std::vector<unsigned char> vec_chars{to_encode.begin(), to_encode.end()};  // convert to_encode into uchar container

  const auto output_len = EVP_EncodeBlock(reinterpret_cast<unsigned char*>(output_buffer.get()), vec_chars.data(), static_cast<int>(vec_chars.size()));

  if (predicted_len != static_cast<unsigned long>(output_len)) {
    throw std::runtime_error("EncodeBase64 error");
  }

  return output_buffer.get();
}

auto DecodeBase64(const std::string& to_decode) -> std::string {
  /// @sa https://www.openssl.org/docs/manmaster/man3/EVP_DecodeBlock.html

  const auto predicted_len = 3 * to_decode.length() / 4;  // predict output size

  const auto output_buffer{std::make_unique<char[]>(predicted_len + 1)};

  const std::vector<unsigned char> vec_chars{to_decode.begin(), to_decode.end()};  // convert to_decode into uchar container

  const auto output_len = EVP_DecodeBlock(reinterpret_cast<unsigned char*>(output_buffer.get()), vec_chars.data(), static_cast<int>(vec_chars.size()));

  if (predicted_len != static_cast<unsigned long>(output_len)) {
    throw std::runtime_error("DecodeBase64 error");
  }

  return output_buffer.get();
}

There's probably a cleaner/better way of doing this (I'd like to get rid of reinterpret_cast). You'll also definitely want a try/catch block to deal with the potential exception.

Simog
  • 193
  • 3
  • 13
  • Upon return from both functions, you are using pointers after they're freed. Either `move` or `release` your unique pointer; or alternatively return `vector`/`string`, which is the better approach. I do not argue about the convertion logic. Your API is not good. The better function inputs would be `span`/`string_view`. I am avoiding template functions; otherwise a `std::ranges::range` input parameter should`ve been used. – Red.Wave Aug 13 '23 at 20:54
4

Improved TCS answer to remove macros/datastructures

unsigned char *encodeb64mem( unsigned char *data, int len, int *lenoutput )
{
// bio is simply a class that wraps BIO* and it free the BIO in the destructor.

BIO *b64 = BIO_new(BIO_f_base64()); // create BIO to perform base64
BIO_set_flags(b64, BIO_FLAGS_BASE64_NO_NL);

BIO *mem = BIO_new(BIO_s_mem()); // create BIO that holds the result

// chain base64 with mem, so writing to b64 will encode base64 and write to mem.
BIO_push(b64, mem);

// write data
bool done = false;
int res = 0;
while(!done)
{
    res = BIO_write(b64, data, len);

    if(res <= 0) // if failed
    {
        if(BIO_should_retry(b64)){
            continue;
        }
        else // encoding failed
        {
            /* Handle Error!!! */
        }
    }
    else // success!
        done = true;
}

BIO_flush(b64);

// get a pointer to mem's data
unsigned char* output;
*lenoutput = BIO_get_mem_data(mem, &output);

// assign data to output
//std::string s(dt, len2);

return output;
}

To write to file

int encodeb64(unsigned char* input, const char* filenm, int leni)
{
BIO *b64 = BIO_new(BIO_f_base64());
BIO_set_flags(b64,BIO_FLAGS_BASE64_NO_NL);

BIO *file = BIO_new_file(filenm, "w");
BIO *mem = BIO_new(BIO_f_buffer());
BIO_push(b64, mem);
BIO_push(mem, file);

// write data
bool done = false;
int res = 0;
while(!done)
{
    res = BIO_write(b64, input, leni);

    if(res <= 0) // if failed
    {
        if(BIO_should_retry(b64)){
            continue;
        }
        else // encoding failed
        {
            /* Handle Error!!! */
        }
    }
    else // success!
        done = true;
}

BIO_flush(b64);
BIO_pop(b64);
BIO_free_all(b64);

    return 0;
}

Base64 encoding from file to file. Many times due to file constraint we have read in chunks of data and do encoding. Below is the code.

int encodeb64FromFile(const char* input, const char* outputfilename)
{
BIO *b64 = BIO_new(BIO_f_base64());
BIO_set_flags(b64,BIO_FLAGS_BASE64_NO_NL);
int leni = 3*64;
unsigned char *data[3*64];
BIO *file = BIO_new_file(outputfilename, "w");
BIO *mem = BIO_new(BIO_f_buffer());
BIO_push(b64, mem);
BIO_push(mem, file);

FILE *fp = fopen(input, "rb");
while ((leni = fread(data, 1, sizeof data, fp)) > 0) {
    // write data
    bool done = false;
    int res = 0;
    while(!done)
    {
        res = BIO_write(b64, data, leni);

        if(res <= 0) // if failed
        {
            if(BIO_should_retry(b64)){
                continue;
            }
            else // encoding failed
            {
                /* Handle Error!!! */
            }
        }
        else // success!
            done = true;
    }

 }

 BIO_flush(b64);
BIO_pop(b64);
BIO_free_all(b64);
fclose(fp);

return 0;
 }
Satish
  • 170
  • 3
  • 12
3
  #include <openssl/bio.h>

  typedef unsigned char byte;      

  namespace base64 {
    static void Encode(const byte* in, size_t in_len,
                       char** out, size_t* out_len) {
      BIO *buff, *b64f;
      BUF_MEM *ptr;

      b64f = BIO_new(BIO_f_base64());
      buff = BIO_new(BIO_s_mem());
      buff = BIO_push(b64f, buff);

      BIO_set_flags(buff, BIO_FLAGS_BASE64_NO_NL);
      BIO_set_close(buff, BIO_CLOSE);
      BIO_write(buff, in, in_len);
      BIO_flush(buff);

      BIO_get_mem_ptr(buff, &ptr);
      (*out_len) = ptr->length;
      (*out) = (char *) malloc(((*out_len) + 1) * sizeof(char));
      memcpy(*out, ptr->data, (*out_len));
      (*out)[(*out_len)] = '\0';

      BIO_free_all(buff);
    }

    static void Decode(const char* in, size_t in_len,
                       byte** out, size_t* out_len) {
      BIO *buff, *b64f;

      b64f = BIO_new(BIO_f_base64());
      buff = BIO_new_mem_buf((void *)in, in_len);
      buff = BIO_push(b64f, buff);
      (*out) = (byte *) malloc(in_len * sizeof(char));

      BIO_set_flags(buff, BIO_FLAGS_BASE64_NO_NL);
      BIO_set_close(buff, BIO_CLOSE);
      (*out_len) = BIO_read(buff, (*out), in_len);
      (*out) = (byte *) realloc((void *)(*out), ((*out_len) + 1) * sizeof(byte));
      (*out)[(*out_len)] = '\0';

      BIO_free_all(buff);
    }
  }
fsenart
  • 5,661
  • 2
  • 35
  • 54
  • Sorry this is years letter after your review request, but... Do not cast to `void *`, especially in C++, but never really, even in C. Also, do not use C-style casts at all in C++ (these are the `(type)` type casts). Always use `static_cast`, `const_cast` or `reinterpret_cast` and use the absolute minimally powerful cast to accomplish the task. Do not use `malloc` in C++ code either. Nor should you allow bare pointers to escape outside the functions that have to deal with OpenSSL. `out_len` should be a reference, not a pointer. Basically, this is a C solution, not a C++ one. – Omnifarious Aug 04 '17 at 19:31
  • 4
    @Omnifarious great code review. he said it's an OpenSSL-only solution though, not a C++ one. I found this useful as a starting point years ago, and though I had to clean it up much as you suggested, it is still an OpenSSL-only solution that fits on one page. If you're taking the effort to critique the answer, why not do the SO thing & suggest an edit to the code? – nevelis Aug 08 '17 at 20:47
  • 1
    @nevelis - The question used the C++ tag, so I assumed the poster wanted a C++ish answer. – Omnifarious May 03 '21 at 17:47
3

Base64 is really pretty simple; you shouldn't have trouble finding any number of implementations via a quick Google. For example here is a reference implementation in C from the Internet Software Consortium, with detailed comments explaining the process.

The openssl implementation layers a lot of complexity with the "BIO" stuff that's not (IMHO) very useful if all you're doing is decoding/encoding.

David Gelhar
  • 27,873
  • 3
  • 67
  • 84
1

Late to the party, but I came across this problem recently myself, but was unhappy with both the BIO solution, which is unnecessarily convoluted, but did not like 'EncodeBlock' either, because it introduces newline characters I do not want in my Base64 encoded string.

After a little sniffing, I came across the header file openssl/include/crypto/evp.h which is not part of the default installation (which only exports the include/openssl folder for me), but exports the solution to the problem.

void evp_encode_ctx_set_flags(EVP_ENCODE_CTX *ctx, unsigned int flags);

/* EVP_ENCODE_CTX flags */
/* Don't generate new lines when encoding */
#define EVP_ENCODE_CTX_NO_NEWLINES          1
/* Use the SRP base64 alphabet instead of the standard one */
#define EVP_ENCODE_CTX_USE_SRP_ALPHABET     2

Using this function, the 'no newline' becomes possible using the EVP interface.

Example:

if (EVP_ENCODE_CTX *context = EVP_ENCODE_CTX_new())
{
    EVP_EncodeInit(context);
    evp_encode_ctx_set_flags(context, EVP_ENCODE_CTX_NO_NEWLINES);
    while (hasData())
    {
        uint8_t *data;
        int32_t length = fetchData(&data);
        int32_t size = (((EVP_ENCODE_CTX_num(context) + length)/48) * 65) + 1;
        uint8_t buffer[size];
        EVP_EncodeUpdate(context, buffer, &size, pData, length);
        //process encoded data.
    }
    uint8_t buffer[65];
    int32_t writtenBytes;
    EVP_EncodeFinal(context, buffer, &writtenBytes);
    //Do something with the final remainder of the encoded string.
    EVP_ENCODE_CTX_free(context);
}

This piece of code will encode the buffer to Base64 without the newlines. Please note the use of EVP_ENCODE_CTX_num to obtain the 'leftover bytes' still stored in the context object to calculate the correct buffer size.

It is only necessary, if you need to call EVP_EncodeUpdate multiple times, because your data is exceedingly large or not available at once.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Refugnic Eternium
  • 4,089
  • 1
  • 15
  • 24