8

Is it possible to decode base64 encoded data to binary data at compile-time?

I think of something that looks like this:

constexpr auto decoded = decodeBase64<"SGVsbG8=">();

or

constexpr auto decoded = decodeBase64("SGVsbG8=");

I have no special requirements fo the resulting type of decoded.

Francis Cugler
  • 7,788
  • 2
  • 28
  • 59
Frank
  • 535
  • 4
  • 14
  • `constexpr auto decoded = decodeBase64<"SGVsbG8=">();` - **no**, `const char[]` cannot be a non-type template parameter as of `C++17`. `constexpr auto decoded = decodeBase64("SGVsbG8=");` - **yes**, if `decodeBase64` takes `const char*` and is a `constexpr` function. – Fureeish Jan 04 '20 at 19:59
  • 3
    Just try making a simple decoder that takes the string as a regular argument, and put `constexpr` in front of it. It should work. If you run into more specific problems, ask again on StackOverflow. – G. Sliepen Jan 04 '20 at 22:28
  • 1
    @Fureeish: It’s not that you can’t have a template *parameter* of that type (adjusted to `const char*` or via a pointer or reference to an array); you just can’t use a string literal as a template *argument* for it. – Davis Herring Jan 05 '20 at 18:27

2 Answers2

8

I found it surprisingly hard to google for a constexpr base64 decoder, so I adapted the one here: https://gist.github.com/tomykaira/f0fd86b6c73063283afe550bc5d77594

Since that's MIT licensed, (sigh), be sure to slap this somewhere in the source file:

/**
 * The MIT License (MIT)
 * Copyright (c) 2016 tomykaira
 *
 * Permission is hereby granted, free of charge, to any person obtaining
 * a copy of this software and associated documentation files (the
 * "Software"), to deal in the Software without restriction, including
 * without limitation the rights to use, copy, modify, merge, publish,
 * distribute, sublicense, and/or sell copies of the Software, and to
 * permit persons to whom the Software is furnished to do so, subject to
 * the following conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
 * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
 * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */

To return a string from a constexpr function, you need to return a char array. Because you can't return an array or std::string, an std::array is the best option. But there is a problem - due to a standards oversight, until C++17 the [] operator of std::array is non-const. You can work around that by inheriting and adding a constructor though:

template <size_t N>
struct fixed_string : std::array<char, N> {
    constexpr fixed_string(const char (&input)[N]) : fixed_string(input, std::make_index_sequence<N>{}) {}
    template <size_t... Is>
    constexpr fixed_string(const char (&input)[N], std::index_sequence<Is...>) : std::array<char, N>{ input[Is]... } {}
};

Change the decoder to use that instead of std::string, and it seems to work as constexpr. Requires C++14 because C++11 constexpr functions can only have one return statement:

template <size_t N>
constexpr const std::array<char, ((((N-1) >> 2) * 3) + 1)> decode(const char(&input)[N]) {
    constexpr unsigned char kDecodingTable[] = {
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 62, 64, 64, 64, 63,
        52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 64, 64, 64, 64, 64, 64,
        64,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
        15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 64, 64, 64, 64, 64,
        64, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
        41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64
    };

    static_assert(((N-1) & 3) == 0, "Input data size is not a multiple of 4");

    char out[(((N-1) >> 2) * 3) + 1] {0};

    size_t out_len = (N-1) / 4 * 3;
    if (input[(N-1) - 1] == '=') out_len--;
    if (input[(N-1) - 2] == '=') out_len--;

    for (size_t i = 0, j = 0; i < N-1;) {
      uint32_t a = input[i] == '=' ? 0 & i++ : kDecodingTable[static_cast<int>(input[i++])];
      uint32_t b = input[i] == '=' ? 0 & i++ : kDecodingTable[static_cast<int>(input[i++])];
      uint32_t c = input[i] == '=' ? 0 & i++ : kDecodingTable[static_cast<int>(input[i++])];
      uint32_t d = input[i] == '=' ? 0 & i++ : kDecodingTable[static_cast<int>(input[i++])];

      uint32_t triple = (a << 3 * 6) + (b << 2 * 6) + (c << 1 * 6) + (d << 0 * 6);

      if (j < out_len) out[j++] = (triple >> 2 * 8) & 0xFF;
      if (j < out_len) out[j++] = (triple >> 1 * 8) & 0xFF;
      if (j < out_len) out[j++] = (triple >> 0 * 8) & 0xFF;
    }
    return fixed_string<(((N-1) >> 2) * 3) + 1>(out);
}

Usage:

constexpr auto x = decode("aGVsbG8gd29ybGQ=");
/*...*/
printf(x.data()); // hello world

Demo: https://godbolt.org/z/HFdk6Z

updated to address helpful feedback from Marek R and Frank

parktomatomi
  • 3,851
  • 1
  • 14
  • 18
  • IMO return value should be `std::array` not a custom class. It also should be point out that this code requires `c++14`. – Marek R Jan 05 '20 at 20:11
  • Really nice, but I'm not sure if the size of the data in fixed_string is correct. It does not take into account if there is none, one or two '=' padding characters. – Frank Jan 05 '20 at 22:14
  • @MarekR for whatever reason, until C++17, `std::array`'s index operator is non-const. You're right though, that would be much cleaner if you're using C++17 or above. – parktomatomi Jan 06 '20 at 00:50
  • @Frank Good point about the output length! I'll update the answer, but it'll still have to allocate the buffer using the simpler calculation because template arguments can't use function parameter content (e.g. the string). – parktomatomi Jan 06 '20 at 01:08
  • @DavisHerring Ideally, I would declare an `std::array`, use it, and return it, but the restriction means I have to build the output in a C array and copy that into an `std::array`. `std::array` doesn't have a constructor for this, or any user constructor, rather it's aggregate-initialized like a C array. So all that the `fixed_string` helper class does now is add a constructor that transfers the contents of the C array to a brace-initializer for the `std::array` The `constexpr` function still outputs an `std::array` though. – parktomatomi Jan 06 '20 at 14:51
  • @parktomatomi: Right, of course—I might have used a helper function rather than a class, even though it does convert back as you said. – Davis Herring Jan 07 '20 at 03:03
0

parktomatomi's answer helped a lot to find this solution. Using C++17 and std::array this seems to work.

The base64 decoder is based on the answer https://stackoverflow.com/a/34571089/3158571

constexpr size_t decodeBase64Length(const char *s)
{
    size_t len = std::char_traits<char>::length(s);
    if (s[len - 2] == '=')
        return (len / 4) * 3 - 2;
    else if(s[len -1] == '=')
        return (len / 4) * 3 - 1;
    else
        return (len / 4) * 3 ;
}

constexpr std::array<int, 256> prepareBase64DecodeTable() {
    std::array<int, 256> T{ 0 }; // breaks constexpr: T.fill(-1) or missing initialization
    for (int i = 0; i < 256; i++)
        T[i] = -1;
    for (int i = 0; i < 64; i++)
        T["ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[i]] = i;
    return T;
}

// based on https://stackoverflow.com/a/34571089/3158571
template<int N>
constexpr std::array<std::byte, N> decodeBase64(const char *b64Str)
{
    constexpr auto T = prepareBase64DecodeTable();
    std::array<std::byte, N> out = { std::byte(0) };
    int valb = -8;
    for (size_t i = 0, val = 0, posOut = 0; i < std::char_traits<char>::length(b64Str) && T[b64Str[i]] != -1; i++) {
        val = (val << 6) + T[b64Str[i]];
        valb += 6;
        if (valb >= 0) {
            out[posOut++] = std::byte((val >> valb) & 0xFF);
            valb -= 8;
        }
    } 
    return out;
}

Usage is not perfect as I can not deduce the length of the resulting array without passing it explicitly as template parameter:

#define B64c "SGVsbG8xMg=="
constexpr auto b64 = decodeBase64<decodeBase64Length(B64c)>(B64c);  // array<byte,7>

Demo at https://godbolt.org/z/-DX2-m

Frank
  • 535
  • 4
  • 14
  • 2
    Instead of `const char *`, use a reference to an array to deduce the length from the argument: `const char (&b64Str)[N]`. – parktomatomi Jan 06 '20 at 00:40