45

Pay attention to base64_decode in http://www.adp-gmbh.ch/cpp/common/base64.html

std::string base64_decode(std::string const& encoded_string)

The function is suppose to return byte array to indicate binary data. However, the function is returning std::string. My guess is that, the author is trying to avoid from perform explicit dynamic memory allocation.

I try to verify the output is correct.

int main()
{
    unsigned char data[3];
    data[0] = 0; data[1] = 1; data[2] = 2;
    std::string encoded_string = base64_encode(data, 3);
    // AAEC
    std::cout << encoded_string << std::endl;


    std::string decoded_string = base64_decode(encoded_string);
    for (int i = 0; i < decoded_string.length(); i++) {
        // 0, 1, 2
        std::cout << (int)decoded_string.data()[i] << ", ";
    }
    std::cout << std::endl;
    getchar();
}

The decoded output is correct. Just want to confirm, is it valid to std::string to hold binary data, to avoid manual dynamic memory management.

std::string s;
s += (char)0;
// s.length() will return 1.
Cheok Yan Cheng
  • 47,586
  • 132
  • 466
  • 875

6 Answers6

63

Yes, you can store any sequence of char in a std::string. That includes any binary data.

CB Bailey
  • 755,051
  • 104
  • 632
  • 656
  • 15
    And more specifically, that includes '\0' characters. – Michael Burr Jan 11 '11 at 07:35
  • 9
    Of course, don't go around using `c_str()` in this case (well, don't expect the `\0` terminator to be an accurate indication of end-of-string anyway) – bdonlan Jan 11 '11 at 07:36
  • 6
    @bdonlan: You can user `c_str()` if you want to, but you probably need to consult `size()` as well. I prefer `data()` in this scenario, it feels more idiomatic. (`c_str()` and `data()` will do _exactly_ the same thing in C++0x.) – CB Bailey Jan 11 '11 at 07:38
  • @CharlesBailey If string allows special char (including \0) then why does the below code print only 'Hello': #include using namespace std; int main() { string str = "Hello \0 World"; cout << str << endl; cout << str.data() << endl; return 0; } – programmer Nov 17 '15 at 05:21
  • 5
    @programmer: You should ask that as a new question. (Basically you've chosen a constructor that only reads to the first NUL for your string.) – CB Bailey Nov 17 '15 at 06:11
19

Yes. std::string can hold any char value ('\0' has no special meaning). However I wouldn't be surprised finding some C++ functions (e.g. from external libraries) having problems with strings with embedded NULs.

Anyway I don't understand what you are going to gain with an std::string instead of std::vector<unsigned char> that would make your intentions more clear and that offers more guarantees (e.g. that all the bytes are in contiguous not-shared memory so that you can pass &x[0] to someone expecting a plain buffer for direct access).

6502
  • 112,025
  • 15
  • 165
  • 265
  • 1
    All bytes contiguous is not necessarily an advantage. Just a thought. – CB Bailey Jan 11 '11 at 07:39
  • 1
    No. Is not me. The API is original designed by another author. Ask him! :) – Cheok Yan Cheng Jan 11 '11 at 07:41
  • 1
    @Charles : When dealing with byte/buffer, it is normal to assume the payload are contiguous. – YeenFei Jan 11 '11 at 07:45
  • 9
    In C++0X `std::string` is guaranteed to be contiguous, and all known implementations of C++03 are contiguous. See http://herbsutter.com/2008/04/07/cringe-not-vectors-are-guaranteed-to-be-contiguous/ – dalle Jan 11 '11 at 07:49
  • 1
    @Charles Bailey: True, but in my experience buffer-type access is more common with binary data; also note that you can have no guarantee that a string implementation uses COW, so if you **need** that (e.g. for memory requirements) then you need to roll your own. With std::string you may or may not have COW (and Murphy says that you will get the opposite of what you need, especially if that for you is critical :-) ) – 6502 Jan 11 '11 at 07:51
  • @dalle: This is true, which makes my original point academic (even if valid). – CB Bailey Jan 11 '11 at 07:51
  • @6502: If you're `std::string` implementation uses COW but it doesn't ensure a unique buffer when you retrieve a modifiable reference to a contained `char` then it is flawed in any case. My point was that if you have a very large string then chopping it into long subsequences can be kinder on your address space requirements. I wasn't really driving at COW / non-COW differences. – CB Bailey Jan 11 '11 at 07:53
  • 1
    In C++0X `std::string` will probably not be allowed to utilize COW: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2534.html – dalle Jan 11 '11 at 08:02
  • @dalle: I would not be suprised if they went without. There have been experiments that showed that COW was more of a hurdle in multi-threaded environments (because of the synchronization required). – Matthieu M. Jan 11 '11 at 08:26
  • FYI, Google also use the similar technique : http://code.google.com/apis/protocolbuffers/docs/proto.html#scalar – Cheok Yan Cheng Jan 11 '11 at 09:34
  • @6502: This is 'Sod's Law' not Murpheys. Murphey's Law: Everything that can go wrong will go wrong. Sod's Law: When things go wrong they will go wrong in the worst possible way. – Martin York Jan 11 '11 at 15:08
  • If string allows special char (including \0) then why does the below code print only 'Hello': #include using namespace std; int main() { string str = "Hello \0 World"; cout << str << endl; cout << str.data() << endl; return 0; } – programmer Nov 17 '15 at 05:20
  • @programmer: For reasons that are not clear to me the standard `std::string` class only accepts a `const char *` for the constructor (not an array of char) and this means that the string literal `"Hello\0world"` will decay into a pointer before being used to build the string and the constructor will stop at the first `'\0'`. In other words while C++ strings have no problems containing `NUL` characters you cannot pass them using a string literal. I don't know why the standard didn't implement initialization from arrays (see http://stackoverflow.com/q/33751291/320726 ) – 6502 Nov 17 '15 at 07:27
  • @programmer: found the answer to why C++ doesn't handle initialization from string literals with embedded `NUL`s. String literals are arrays of chars that however include the ending `NUL`. Initializing a string from an array of chars without crazy special cases would need to include also that `NUL` and this is clearly not what would be useful. C++14 added a special `s` suffix that solves the problem (i.e. `std::string x = "Hello\0world."s;` builds a string with an embedded `NUL` but no extra `NUL` at the end). – 6502 Nov 17 '15 at 08:09
2

I don't think it's completely valid. Care must be taken with string and binary data because it uses internally char type and char depends in the implementation if it is defined as unsigned or signed type. I prefer to use basic_string<unsigned char> and be sure of what i'm reading.

HernanBailo
  • 101
  • 3
0

I dont think one should use std::string for byte-data-storage. The method provide aren't design to deal with byte-data and you will risk yourself since any changes (or "optimization") on std::string will break your code.

YeenFei
  • 3,180
  • 18
  • 26
0

it is better to use std::vector or std::vector (where byte is typedef to uint8 ) to express nature of data. You will no longer have string specific functions available , which is what you want for binary data

Marian K.
  • 189
  • 1
  • 2
-6

You need an array of character( not string) to store the binary data. Best is use vector.

CrazyC
  • 1,840
  • 6
  • 39
  • 60