7

What is the proper c++11 way to extract a set of characters out of a stringstream without using boost?

I want to do it without copying, if possible, because where this is used is in a critical data loop. It seems, though, std::string does not allow direct access to the data.

For example, the code below performs a substring copy out of a stringstream:

inline std::string left(std::stringstream ss, uint32_t count) {
    char* buffer = new char[count];
    ss.get(buffer, count);
    std::string str(buffer);  // Second copy performed here
    delete buffer;
    return str;
}
  1. Should I even be using char *buffer according to c++11?
  2. How do I get around making a second copy?

My understanding is that vectors initialize every character, so I want to avoid that.

Also, this needs to be passed into a function which accepts const char *, so now after this runs I am forced to do a .c_str(). Does this also make a copy?

It would be nice to be able to pass back a const char *, but that seems to go against the "proper" c++11 style.

To understand what I am trying to do, here is "effectively" what I want to use it for:

fprintf( stderr, "Data: [%s]...", left(ststream, 255) );

But the c++11 forces:

fprintf( stderr, "Data: [%s]...", left(str_data, 255).c_str() );

How many copies of that string am I making here?

How can I reduce it to only a single copy out of the stringstream?

user3072517
  • 513
  • 1
  • 7
  • 21
  • `std::string str(buffer);` causes undefined behaviour, `buffer` is not null-terminated – M.M Feb 22 '15 at 21:19
  • have you considered passing `ss` by const reference instead of by value? That would eliminate a copy. – M.M Feb 22 '15 at 21:20
  • giving a `std::string` to `fprintf` is undefined behaviour in all versions of C++ ; this was not changed by C++11 – M.M Feb 22 '15 at 21:27
  • Another option is `%.255s` with argument `str_data.str().c_str()` – M.M Feb 22 '15 at 22:22

4 Answers4

9

You could use something like described in this link: How to create a std::string directly from a char* array without copying?

Basically, create a string, call the resize() method on the string with the size that is passed to your function and then pass the pointer to the first character of the string to the stringstring.get() method. You will end up with only one copy.

inline std::string left(std::stringstream& ss, uint32_t count) {
    std::string str;
    str.resize(count);
    ss.get(&str[0], count);
    return str;
}
Community
  • 1
  • 1
fvannee
  • 762
  • 4
  • 10
  • 1
    I would also suggest passing the stringstream in by reference. We don't know the length of data in stringstream and copying that amount of data could be the bottleneck itself. Additionally, I would note that returning the atomically created variable should move the string vs. copying it as long as this is compiled with c++11 or equivalent flags. – Freddy Feb 22 '15 at 21:13
  • Good point - stringstream should be passed by reference. The return value will never be copied though, as every compiler will use return value optimization anyway. – fvannee Feb 22 '15 at 21:17
  • Yes, I did actually use shared_ptr on the stream (just didn't do so in my sample code). – user3072517 Feb 22 '15 at 21:52
2

My suggestion:

  1. Create the std::string to be returned by giving it the size.

  2. Read the characters one by one from the stringstream and set the values in the std::string.

Here's what the function looks like:

inline std::string left(std::stringstream ss, uint32_t count) {
    std::string str(count+1, '\0');
    for (uint32_t i = 0; i < count; ++i )
    {
        int c = ss.getc();
        if ( c != EOF )
        {
           str[i] = c;
        }
        else
        {
           break;
        }
    }
    return str;
}
R Sahu
  • 204,454
  • 14
  • 159
  • 270
2

R Sahu, this I like! Obvious now that I see it done. ;-)

I do have one mod though (as well as passed a shared_ptr of stream which is what I actually had in my version):

In your initializer, you are filling with nulls. You only need to fill with the last one, so I propose a tweak of this:

inline std::string left(std::shared_ptr<std::stringstream> ss, uint32_t count) {
    std::string str;
    str.reserve(count + 1);
    uint32_t i;
    for(i = 0; i < count; ++i) {
        int c = ss->get();
        if(c != EOF) {
            str[i] = c;
        } else {
            break;
        }
    }
    str[i] = '\0';
    return str;
}

Now, only initialized with nulls on a single character.

Thanks R Sahu!

user3072517
  • 513
  • 1
  • 7
  • 21
  • The string class does not have a constructor which takes only an integer as parameter. You always need to specify the character to fill the string with as well. – fvannee Feb 22 '15 at 21:22
  • 2
    why add a `shared_ptr` when you can just use a reference? – oblitum Feb 22 '15 at 21:32
  • Because a reference still ends up copying. I did some performance tests and in all cases shared_ptr guarantees a pointer copy, but reference doesn't. In my case, it kept internally performing copies. – user3072517 Feb 22 '15 at 21:51
  • @user3072517 passing by reference does **not** do a copy of the stringstream or "internally perform copies". You must be misinterpreting what you are seeing in your test results. – M.M Feb 22 '15 at 22:13
  • So in my testing, I was actually doing a class rather than a string. I found tremendous performance improvements over using shared_ptr vs const Class&. This particular routine is executed 100s of millions of times and the time delta was significant. Specifically, it went from 48 seconds to over 1100 seconds before I killed it. – user3072517 Feb 23 '15 at 21:01
0

If the purpose of this function is solely for passing to fprintf or another C-style stream, then you could avoid allocation completely by doing the following:

void left(FILE *out, std::stringstream &in, size_t count)
{
    in.seekg(0);
    char ch;
    while ( count-- && in.get(ch) )
        fputc(out, static_cast<unsigned char>(ch));
}

Usage:

fprintf( stderr, "Data: [" );
left(stderr, stream, 255);
fprintf( stderr, "] ...\n");

Bear in mind that another seekg will be required if you try to use the stream reading functions on the stringstream later; and it would not surprise me if this is the same speed or slower than the options involving str().

M.M
  • 138,810
  • 21
  • 208
  • 365