base64 encoding removing carriage return from dos header

Question

I have been trying to encode the binary data of an application as base64 (specifically boosts base64), but I have run into an issue where the carriage return after the dos header is not being encoded correctly.

it should look like this:

This program cannot be run in DOS mode.[CR]
[CR][LF]

but instead its outputting like this:

This program cannot be run in DOS mode.[CR][LF]

it seems this first carriage return is being skipped, which then causes the DOS header to be invalid when attempting to run the program.

the code for the base64 algorithm I am using can be found at: https://www.boost.org/doc/libs/1_66_0/boost/beast/core/detail/base64.hpp

Thanks so much!

void load_file(const char* filename, char** file_out, size_t& size_out)
{
        FILE* file;
        fopen_s(&file, filename, "r");
        if (!file)
            return false;

        fseek(file, 0, SEEK_END);
        size = ftell(file);
        rewind(file);

        *out = new char[size];
        fread(*out, size, 1, file);
        fclose(file);
}

void some_func()
{
    char* file_in;
    size_t file_in_size;
    load_file("filename.bin", &file_in, file_in_size);
    auto encoded_size = base64::encoded_size(file_in_size);
    auto file_encoded = new char[encoded_size];
    memset(0, file_encoded, encoded_size);
    base64::encode(file_encoded, file_in, file_in_size);
    std::ofstream orig("orig.bin", std::ios_base::binary);
    for (int i = 0; i < file_in_size; i++)
    {
        auto c = file_in[i];
        orig << c; // DOS header contains a NULL as the 3rd char, don't allow it to be null terminated early, may cause ending nulls but does not affect binary files.
    }
    orig.close();
    std::ofstream encoded("encoded.txt"); //pass this output through a base64 to file website.
    encoded << file_encoded; // for loop not required, does not contain nulls (besides ending null) will contain trailing encoded nulls.
   encoded.close();
   auto decoded_size = base64::decoded_size(encoded_size);
   auto file_decoded = new char[decoded_size];
   memset(0, file_decoded, decoded_size); // again trailing nulls but it doesn't matter for binary file operation. just wasted disk space.
   base64::decode(file_decoded, file_encoded, encoded_size);
   std::ofstream decoded("decoded.bin", std::ios_base::binary);
   for (int i = 0; i < decoded_size; i++)
   {
        auto c = file_decoded[i];
        decoded << c;
   }
   decoded.close();
   free(file_in);
   free(file_encoded);
   free(file_decoded);
}

The above code will show that the file reading does not remove the carriage return, while the encoding of the file into base64 does.

Please put code of your program into your question, not link or image. — Slava, Apr 27 '22 at 05:41
Just a shot into blue: If you load the file with `std::fstream`, please, ensure that `std::ios::binary` is set. It might be, that the double `CR` gets lost when you load the binary contents but not when it's base64 encoded. Reading binary files in Windows correct is a very common issue. I struggle to believe that the boost function is broken. — Scheff's Cat, Apr 27 '22 at 05:47
Yes, leaving l lacking binary mode on the input and/or the output seems the most likely cause. Please provide a [mre] — Alan Birtles, Apr 27 '22 at 06:03
Note that using detail namespaces is not supported. You might be missing preconditions/limitations of the code you're appropriating — sehe, Apr 27 '22 at 13:33
I have already tested my file reading code and it does not remove the carriage return, only base64 encoding removes the return. The boost base64 namespace is completely independent from boost and does not require any other "preconditions / limitations". I should not need to provide a minimum example as any binary program loaded through the "base64::encode()" function removes this carriage return. So the issue should be findable in the code itself. Though if you would like to test it yourself, simply compile your program and fread it into memory as bytes, then convert to base64 using the above. — Chris, Apr 27 '22 at 15:18
I'd love for this to be reopened after you added the code, because I figured out the problem and also can reduce your code by about 50% making it correct. /cc @AlanBirtles — sehe, Apr 27 '22 at 16:37
@sehe I'm not sure this does need re-opening? It'd just then need closing as a duplicate of the many using non-binary mode with binary files and using formatted stream functions with binary data — Alan Birtles, Apr 27 '22 at 17:38
you need to use binary mode with `fopen_s` (`"rb" instead of "r"). `orig << c` shouldn't be used with binary data, use `read` and `write` instead, that loop can just be `orig.write(file_decoded, decoded_size)` — Alan Birtles, Apr 27 '22 at 17:49
@AlanBirtles Yeah, you couldn't know that was a red herring (because the code was added later). Thanks for the help! — sehe, Apr 27 '22 at 20:15

score 1 · Accepted Answer · answered Apr 27 '22 at 20:13

Okay thanks for adding the code!

I tried it, and indeed there was "strangeness", even after I simplified the code (mostly to make it C++, instead of C).

So what do you do? You look at the documentation for the functions. That seems complicated since, after all, detail::base64 is, by definition, not part of public API, and "undocumented".

However, you can still read the comments at the functions involved, and they are pretty clear:

/** Encode a series of octets as a padded, base64 string.

    The resulting string will not be null terminated.

    @par Requires

    The memory pointed to by `out` points to valid memory
    of at least `encoded_size(len)` bytes.

    @return The number of characters written to `out`. This
    will exclude any null termination.
*/
std::size_t
encode(void* dest, void const* src, std::size_t len)

And

/** Decode a padded base64 string into a series of octets.

    @par Requires

    The memory pointed to by `out` points to valid memory
    of at least `decoded_size(len)` bytes.

    @return The number of octets written to `out`, and
    the number of characters read from the input string,
    expressed as a pair.
*/
std::pair<std::size_t, std::size_t>
decode(void* dest, char const* src, std::size_t len)

Conclusion: What Is Wrong?

Nothing about "dos headers" or "carriage returns". Perhaps maybe something about "rb" in fopen (what's the differences between r and rb in fopen), but why even use that:

template <typename Out> Out load_file(std::string const& filename, Out out) {
    std::ifstream ifs(filename, std::ios::binary); // or "rb" on your fopen
    ifs.exceptions(std::ios::failbit |
                   std::ios::badbit); // we prefer exceptions
    return std::copy(std::istreambuf_iterator<char>(ifs), {}, out);
}

The real issue is: your code ignored all return values from encode/decode.

The encoded_size and decoded_size values are estimations that will give you enough space to store the result, but you have to correct it to the actual size after performing the encoding/decoding.

Here's my fixed and simplified example. Notice how the md5sums checkout:

Live On Coliru

#include <boost/beast/core/detail/base64.hpp>
#include <fstream>
#include <iostream>
#include <vector>
namespace base64 = boost::beast::detail::base64;

template <typename Out> Out load_file(std::string const& filename, Out out) {
    std::ifstream ifs(filename, std::ios::binary); // or "rb" on your fopen
    ifs.exceptions(std::ios::failbit |
                   std::ios::badbit); // we prefer exceptions
    return std::copy(std::istreambuf_iterator<char>(ifs), {}, out);
}

int main() {
    std::vector<char> input;
    load_file("filename.bin", back_inserter(input));

    // allocate "enough" space, using an upperbound prediction:
    std::string encoded(base64::encoded_size(input.size()), '\0');

    // encode returns the **actual** encoded_size:
    auto encoded_size = base64::encode(encoded.data(), input.data(), input.size());
    encoded.resize(encoded_size); // so adjust the size

    std::ofstream("orig.bin", std::ios::binary)
        .write(input.data(), input.size());
    std::ofstream("encoded.txt") << encoded;

    // allocate "enough" space, using an upperbound prediction:
    std::vector<char> decoded(base64::decoded_size(encoded_size), 0);

    auto [decoded_size, // decode returns the **actual** decoded_size
          processed]    // (as well as number of encoded bytes processed)
        = base64::decode(decoded.data(), encoded.data(), encoded.size());
    decoded.resize(decoded_size); // so adjust the size

    std::ofstream("decoded.bin", std::ios::binary)
        .write(decoded.data(), decoded.size());
}

Prints. When run on "itself" using

g++ -std=c++20 -O2 -Wall -pedantic -pthread main.cpp -o filename.bin && ./filename.bin
md5sum filename.bin orig.bin decoded.bin
base64 -d < encoded.txt | md5sum

It prints

d4c96726eb621374fa1b7f0fa92025bf  filename.bin
d4c96726eb621374fa1b7f0fa92025bf  orig.bin
d4c96726eb621374fa1b7f0fa92025bf  decoded.bin
d4c96726eb621374fa1b7f0fa92025bf  -

just for further info: this did solve my problem but I was actually using the return value of encode/decode I just didn't include it in the psuedo code, though my output did seem to still contain extra nulls, I wasn't using hashsums to check because of this, instead I was looking at the output in a hex viewer, and determined the problem to be a missing carriage return. Though I think this problem was being introduced through some misuse of C strings on my end, either way thanks for the help! — Chris, Apr 29 '22 at 21:41
"I was actually using the return value of encode/decode I just didn't include it in the psuedo code" - not much use telling us this. In SO questions, the code in the question is the real code :) Cheers — sehe, Apr 29 '22 at 23:32

base64 encoding removing carriage return from dos header

1 Answers1

Conclusion: What Is Wrong?