27

I have a function which needs to encode strings, which needs to be able to accept 0x00 as a valid 'byte'. My program needs to check the length of the string, however if I pass in "\x00" to std::string the length() method returns 0.

How can I get the actual length even if the string is a single null character?

Adamski
  • 3,585
  • 5
  • 42
  • 78
  • 4
    Have a look at the [available constructors](http://en.cppreference.com/w/cpp/string/basic_string/basic_string) and which is used in your case. – chris Jan 14 '18 at 23:41
  • 9
    You could also try `strlen("\x00");` for the same result. – Bo Persson Jan 15 '18 at 01:36
  • 1
    See also: https://stackoverflow.com/questions/48210211/access-violation-when-sending-a-0-int-literal-to-a-const-string-parameter/ – M.M Jan 15 '18 at 02:52
  • Would you not be better to store a vector (other containers are available) of bytes instead of a string? – Jack Aidley Jan 15 '18 at 10:14
  • 1
    @JackAidley the data is coming in as a string, once processed it is stored as a vector of bytes. – Adamski Jan 15 '18 at 10:56
  • 1
    @BoPersson, `strlen(3)` is **not** a C++ function. It's a C legacy function, that does not know about c++ `string` type. You cannot use it with `string`s but by converting the `string` to a legacy C `char *` string. That way, `strlen(3)` doesn't know about array sizes, it only searches for the `\0` char and returns the difference between the pointer passed to it and the place where it found the null char. – Luis Colorado Jan 16 '18 at 08:58

3 Answers3

47

std::string is perfectly capable of storing nulls. However, you have to be wary, as const char* is not, and you very briefly construct a const char*, from which you create the std::string.

std::string a("\x00");

This creates a constant C string containing only the null character, followed by a null terminator. But C strings don't know how long they are; so the string thinks it runs until the first null terminator, which is the first character. Hence, a zero-length string is created.

std::string b("");
b.push_back('\0');

std::string is null-clean. Characters (\0) can be the zero byte freely as well. So, here, there is nothing stopping us from correctly reading the data structure. The length of b will be 1.

In general, you need to avoid constructing C strings containing null characters. If you read the input from a file directly into std::string or make sure to push the characters one at a time, you can get the result you want. If you really need a constant string with null characters, consider using some other sentinel character instead of \0 and then (if you really need it) replace those characters with '\0' after loading into std::string.

Silvio Mayolo
  • 62,821
  • 6
  • 74
  • 116
  • 24
    _"But C strings don't know how long they are"_ -- To be more precise, `std::string("\x00")` first creates a *string literal* of type `const char[2]`, so at this point the size is still well known. Though this array decays into `const char*` which is passed to the `std::string` constructor. At this point the array size is "lost" because the constructor can only scan for the 1st `\0` to determine the size. In theory `std::string` c'tor could have an overload for arrays, that would allow embedded `\0` in string literal. – zett42 Jan 15 '18 at 00:17
  • 1
    @zett42 such an overload would have to be templated, and be instantiated for each new length of the array — there's no other native way to pass sized arrays in C++. – Ruslan Jan 15 '18 at 06:28
  • 5
    @Ruslan So? That would have been perfectly acceptable. Of course it would also have been silly, as it’d break the C string literal convention and thus violates the user’s expectation in most cases (nobody wants to find a null char in their string when initialising it as `std::string("hi")`). – Konrad Rudolph Jan 15 '18 at 11:13
  • @KonradRudolph So, that would generate a new function for each size of string literals passed to `std::string::string`. Not perfect from code size perspective; you'd only have to hope the linker will omit these functions and the compiler inline their code into callers. – Ruslan Jan 15 '18 at 13:40
  • @Ruslan Code inlining wouldn’t reduce code size. On the contrary, to help with generated code size the constructor template could dispatch to a size-erased non-generic function that isn’t inlined. But for this particular constructor, inlining probably works just fine, and results in the same code size regardless of whether you’d have a constructor template or a non-template constructor (since it’s inlined either way). – Konrad Rudolph Jan 15 '18 at 13:51
  • @KonradRudolph inlining won't, but creation of additional _callable_ instance of the function will. – Ruslan Jan 15 '18 at 13:53
  • @zett42 We already have `std::string (const char* s, size_t n);` overload, and it can handle dynamically allocated `char` arrays. Calling it as `my_str("\0", ARRAYSIZE("\0"));` is usually a minor inconvenience. – Joker_vD Jan 15 '18 at 14:17
  • @Joker_vD Agreed. I wasn't saying that an array overload would actually make sense. I mentioned it solely to underline the fact that the compiler knows the size of a string literal, whether or not it has embedded `\0`s. – zett42 Jan 15 '18 at 15:52
  • I think you wanted to type `b.push_back('\0');` instead of `a.push_back('\0');`. Apart from this, great answer! – Fabio says Reinstate Monica Jan 15 '18 at 22:18
  • @FabioTurati It's always the little things that slip by, isn't it? Thanks! :) – Silvio Mayolo Jan 15 '18 at 23:41
30

You're passing in an empty string. Use std::string(1, '\0') instead.

Or std::string{ '\0' } (thanks, @zett42)

Sid S
  • 6,037
  • 2
  • 18
  • 24
24

With C++14, you can use a string literal operator to store strings with null bytes:

using namespace std::string_literals;

std::string a = "\0"s;
std::string aa = "\0\0"s; // two null bytes are supported too
Erbureth
  • 3,378
  • 22
  • 39
  • 5
    @sp2danny ... but may also bring in other, unwanted literal operators (e.g. from `std::literals::chrono_literals`). – Toby Speight Jan 15 '18 at 15:08