3

I'm relatively new to C++ and I'm still getting to grips with the C++ Standard Library. To help transition from C, I want to format a std::string using printf-style formatters. I realise stringstream is a more type-safe approach, but I find myself finding printf-style much easier to read and deal with (at least, for the time being). This is my function:


using namespace std;

string formatStdString(const string &format, ...)
{
    va_list va;
    string output;
    size_t needed;
    size_t used;

    va_start(va, format);
    needed = vsnprintf(&output[0], 0, format.c_str(), va);
    output.resize(needed + 1); // for null terminator??
    va_end(va);    

    va_start(va, format);
    used = vsnprintf(&output[0], output.capacity(), format.c_str(), va);
    // assert(used == needed);
    va_end(va);

    return output;
}

This works, kinda. A few things that I am not sure about are:

  1. Do I need to make room for a null terminator, or is this unnecessary?
  2. Is capacity() the right function to call here? I keep thinking length() would return 0 since the first character in the string is a '\0'.

Occasionally while writing this string's contents to a socket (using its c_str() and length()), I have null bytes popping up on the receiving end, which is causing a bit of grief, but they seem to appear inconsistently. If I don't use this function at all, no null bytes appear.

dreamlax
  • 93,976
  • 29
  • 161
  • 209
  • 12
    If you are learning C++ and the STL, why don't you learn it's proper use? The reason stringstreams and string are preferred over printf, char* is to address the problems you're having with using C style strings, null terminators and the like. Use printf is handy, but there is a very good reason that very smart people came up with a different way to handle formatting in C++. – Alan May 21 '10 at 07:36
  • 10
    Consider using [`boost::format`](http://www.boost.org/doc/libs/1_43_0/libs/format/index.html). It has nice syntax and it simple to use. – Kirill V. Lyadvinsky May 21 '10 at 07:45
  • 3
    +1 for @Alan's comment. It is better to learn the idioms of a programming language rather than try to port the nuances of another. – johnsyweb May 21 '10 at 07:51
  • For several different approaches to this, see http://stackoverflow.com/questions/2552839/which-c-standard-library-wrapper-functions-do-you-use/2552973#2552973 –  May 21 '10 at 07:52
  • @Neil: There's got to be an example using variadic templates somewhere out there. – sbi May 21 '10 at 07:58
  • I will try to use stringstreams from now on, but do I really have to call a function each time I want to change the field width, field output type (hex/decimal) and fill? – dreamlax May 21 '10 at 08:06
  • My boss at a previous company forced us to write "C++" style perl. Made me want to puke. I understand not getting all Nabokov with Perl syntax, but C'mon! – Alan May 21 '10 at 08:06
  • 1
    So wait, you think that "if I define C++ so it looks exactly like C", it's actually a good thing? It's not. If you want C, you should just write C code and feed it to a C compiler. if you're going to use C++, **don't try to make it look like C**. – jalf May 21 '10 at 10:39
  • 2
    @jalf: Where did I say that? I said I'm learning C++. I haven't learned the ins and outs of streams yet and when I do I'll be sure to learn them inside out, but for now I want to output formatted data and the only way that I'm familiar with is using printf style. I know this isn't the "C++" way. This code isn't going into a nuclear missile launcher, it's to help me eventually get to grips with C++. – dreamlax May 21 '10 at 11:18
  • @jalf: But, now I see, streams are a very important part of C++ and should definitely be the next thing that I should try and get under my belt. – dreamlax May 21 '10 at 11:24
  • 1
    @jalf: Streams are horrible in my opinion. They are so unintuitive and this is perhaps reflected by the fact that no other language seems to use this technique, and in fact, many languages either did something else for formatted I/O or followed a C-style approach. I think I'll avoid C++ altogether if the mindset of many seems to be "streams are the best thing since sliced bread and if you don't use them then don't use C++". – dreamlax Mar 20 '12 at 11:05
  • C++'s iostreams are pretty awkward, sure. I don't think I've ever met a C++ programmer who thought they were the best thing since sliced bread. But it's important to realize that C's stdio is also a horrible API to use in C++. Several libraries have defined some fairly nice compromises which gives you the best of both worlds (Bosot.Format or the FastFormat library come to mind). But blindly emulating the broken and unsafe C stdio isn't a good idea. – jalf Mar 20 '12 at 11:37

7 Answers7

13

With the current standard (the upcomming standard differs here) there is no guarantee that the internal memory buffer managed by the std::string will be contiguous, or that the .c_str() method returns a pointer to the internal data representation (the implementation is allowed to generate a contiguous read-only block for that operation and return a pointer into it. A pointer to the actual internal data can be retrieved with the .data() member method, but note that it also returns a constant pointer: i.e. it is not intended for you to modify the contents. The buffer return by .data() it is not necessarily null terminated, the implementation only needs to guarantee the null termination when c_str() is called, so even in implementations where .data() and .c_str() are called, the implementation can add the \0 to the end of the buffer when the latter is called.

The standard intended to allow rope implementations, so in principle it is unsafe to do what you are trying, and from the point of view of the standard you should use an intermediate std::vector (guaranteed contiguity, and there is a guarantee that &myvector[0] is a pointer to the first allocated block of the real buffer).

In all implementations I know of, the internal memory handled by std::string is actually a contiguous buffer and using .data() is undefined behavior (writting to a constant variable) but even if incorrect it might work (I would avoid it). You should use other libraries that are designed for this purpose, like boost::format.

About the null termination. If you finally decide to follow the path of the undefined... you would need to allocate extra space for the null terminator, since the library will write it into the buffer. Now, the problem is that unlike C-style strings, std::strings can hold null pointers internally, so you will have to resize the string down to fit the largest contiguous block of memory from the beginning that contains no \0. That is probably the issue you are finding with spurious null characters. This means that the bad approach of using vsnprintf(or the family) has to be followed by str.resize( strlen( str.c_str() ) ) to discard all contents of the string after the first \0.

Overall, I would advice against this approach, and insist in either getting used to the C++ way of formatting, using third party libraries (boost is third party, but it is also the most standard non-standard library), using vectors or managing memory like in C... but that last option should be avoided like the plague.

// A safe way in C++ of using vsnprintf:
std::vector<char> tmp( 1000 ); // expected maximum size
vsnprintf( &tmp[0], tmp.size(), "Hi %s", name.c_str() ); // assuming name to be a string
std::string salute( &tmp[0] );
David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • Regarding the null terminator: In my documentation of vsnprintf (VS2008) the following is mentioned: "If there is room at the end (that is, if the number of characters to write is less than count), the buffer will be null-terminated." – rjnilsson May 21 '10 at 08:28
  • 1
    My understanding is that the C++ committee added the guarantee of contiguous storage after determining that no current implementation violates it. – Mark Ransom May 21 '10 at 16:45
  • Yes, I have also read that before. There are a couple of things that were in the standard to support specific implementations and that are being forgotten. The support for rope implementations did not really get anywhere. Some implementations used/use copy on write semantics, but that also is disappearing as copy-on-write does not play well with multithreading, and the whole business is moving in that direction... – David Rodríguez - dribeas May 22 '10 at 10:36
  • 3
    Do note that C++11, the final standard, also guarantees that all `std::basic_string`s are null-terminated. Technically, they are terminated by `CharT()` (where `CharT` is the character type), which when used with `char` will produce the null-terminator. So `.data()` will return a null-terminated string, and you can *rely* on the terminator being there. Also note that the C++11 spec says that changing the terminator will result in undefined behavior. – Nicol Bolas May 11 '12 at 08:13
5

Use boost::format, if you prefer printf() over streams.

Edit: Just to make this clear, actually I fully agree with Alan, who said you should use streams.

sbi
  • 219,715
  • 46
  • 258
  • 445
2

I think that there are no guarantees that the layout of the string as referenced by &output[0] is contiguous and that you can write to it.

Use std::vector instead as a buffer which is guaranteed to have contiguous storage since C++03.

using namespace std;

string formatStdString(const string &format, ...)
{
    va_list va;
    vector<string::value_type> output(1); // ensure some storage is allocated
    size_t needed;
    size_t used;

    va_start(va, format);
    needed = vsnprintf(&output[0], 0, format.c_str(), va);
    output.resize(needed); // don't need null terminator
    va_end(va);    

    // Here we should ensure that needed != 0
    va_start(va, format);
    used = vsnprintf(&output[0], output.size(), format.c_str(), va); // use size()
    // assert(used == needed);
    va_end(va);

    return string(output.begin(), output.end());
}

NOTE: You'll have to set an initial size to the vector as the statement &output[0] can otherwise attempt to reference a non-existing item (as the internal buffer might not have been allocated yet).

rjnilsson
  • 2,343
  • 15
  • 20
  • Is checking whether `needed != 0` necessary? If `needed` is 0, `output.size()` should also be 0, and `vsnprintf` won't attempt to write anything to the buffer. – dreamlax May 21 '10 at 08:17
  • 1
    @dreamlax: if you resize to 0, and then try to dereference the buffer (as in &output[0]) you are invoking undefined behaviour, IMHO. – rjnilsson May 21 '10 at 08:19
  • you're probably right. I keep forgetting that in C++, `[]` is can be a function call. – dreamlax May 21 '10 at 08:23
1

1) You do not need to make space for the null terminator.
2) capacity() tells you how much space the string has reserved internally. length() tells you the length of the string. You probably don't want capacity()

Richard
  • 743
  • 2
  • 6
  • 12
0

The std::string class takes care of the null terminator for you.

However, as pointed out, since you're using vnsprintf to the raw underying string buffer (C anachronisms die hard...), you will have to ensure there is room for the null terminator.

Alan
  • 45,915
  • 17
  • 113
  • 134
0

My implementation for variable argument lists for functions is like this:

std::string format(const char *fmt, ...)
{
  using std::string;
  using std::vector;

  string retStr("");

  if (NULL != fmt)
  {
     va_list marker = NULL;

     // initialize variable arguments
     va_start(marker, fmt);

     // Get formatted string length adding one for NULL
     size_t len = _vscprintf(fmt, marker) + 1;

     // Create a char vector to hold the formatted string.
     vector<char> buffer(len, '\0');
     int nWritten = _vsnprintf_s(&buffer[0], buffer.size(), len, fmt,
marker);

     if (nWritten > 0)
     {
        retStr = &buffer[0];
     }

     // Reset variable arguments
     va_end(marker);
  }

  return retStr;
}
Brent81
  • 1,152
  • 1
  • 13
  • 19
0

To help transition from C, I want to format a std::string using printf-style formatters.

Just don't :(

If you do this, you're not actually learning C++ but coding C with a C++ compiler. It's a bad mindset, bad practice, and it propagates the problems that the std::o*stream classes were created to avoid.

I realise stringstream is a more type-safe approach, but I find myself finding printf-style much easier to read and deal with (at least, for the time being).

It's not a more typesafe approach. It is a typesafe approach. More than that, it minimizes dependencies, it lowers the number of issues you have to keep track of (like explicit buffer allocation and keeping track of the null char terminator) and it makes it easier to maintain your code.

Above that it is completely extensible / customizable:

  • you can extend locale formatting

  • you can define the i/o operations for custom data types

  • you can add new types of output formatting

  • you can add new buffer i/o types (making for example std::clog write to a window)

  • you can plug in different error handling policies.

std::o*stream family of classes is very powerful and once you learn to use it correctly there's little doubt you will not go back.

Unless you have very specific requirements your time will probably be much better spent learning the o*stream classes than writing printf in C++.

utnapistim
  • 26,809
  • 3
  • 46
  • 82