34

If I construct a string made of a list of space separated floating point values using std::ostringstream:

std::ostringstream ss;
unsigned int s = floatData.size();
for(unsigned int i=0;i<s;i++)
{
    ss << floatData[i] << " ";
}

Then I get the result in a std::string:

std::string textValues(ss.str());

However, this will cause an unnecessary deep copy of the string contents, as ss will not be used anymore.

Is there any way to construct the string without copying the entire content?

Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
galinette
  • 8,896
  • 2
  • 36
  • 87
  • 1
    Are you sure that is copying? Its a perfectly reasonable case for applying RVO I think. Inspect your assembly to see what your compiler is doing. – Manu343726 Oct 08 '14 at 21:19
  • 2
    @Manu343726 RVO applies to a `return` value. There is no `return` here. – Walter Oct 08 '14 at 21:22
  • 6
    Standard says about str(): "returns a string object with a copy of the current contents of the stream." So yes it copies – galinette Oct 08 '14 at 21:23
  • @galinette you don't construct an `istringstream` anywhere here. – Walter Oct 08 '14 at 21:24
  • @walter : sorry I was doing too much things at the same time. Corrected – galinette Oct 08 '14 at 21:26
  • 1
    As QoI, an implementation could do something nice with `move(ss).str()`, but I don't know if any does right now. – Marc Glisse Oct 08 '14 at 21:26
  • @MarcGlisse it can't, because it doesn't know from inside `str()` if there will be more writes or not. – Anton Savin Oct 08 '14 at 21:29
  • I don't really know if this is exactly what you want, but you could use `ss.rdbuf()` which is supposed not to create the intermediate string. – Javi Oct 08 '14 at 21:29
  • @Walter I mean the possible RVO from `.str()`. About the Standard quote, its a "copy" in an abstract sense since a string is a different media than a stream. But the implementation could do whatever it likes. Being practical, whoever cares of how that implementation works, if the data of the stream is buffered and can be easily moved into the stream instead of copied, etc... – Manu343726 Oct 08 '14 at 21:29
  • @MarcGlisse : can you write a member function prototype knowing that "*this" is a rvalue? – galinette Oct 08 '14 at 21:30
  • 1
    @galinette Yes, you can, though I think most compilers don't support that well. – Walter Oct 08 '14 at 21:31
  • @galinette yes http://akrzemi1.wordpress.com/2014/06/02/ref-qualifiers/ – Manu343726 Oct 08 '14 at 21:31
  • Nice to know, I currently explicitely use "move_" prefixed non const functions to "pop" derived values for doing this – galinette Oct 08 '14 at 21:39
  • I am pretty sure RVO is applied on the str function. Why don't you step in your debugger to find out? – Neil Kirk Oct 08 '14 at 21:43
  • You might take a look at http://stackoverflow.com/questions/1494182/setting-the-internal-buffer-used-by-a-standard-stream-pubsetbuf. Pity that there isn't a constructor taking a `string &` that would just use this as the underlying buffer. – sfjac Oct 08 '14 at 21:45
  • [May be useful](http://stackoverflow.com/questions/1494182/setting-the-internal-buffer-used-by-a-standard-stream-pubsetbuf) - set your ostringstream to write to an external buffer that you have full control over – M.M Oct 08 '14 at 22:22
  • @MattMcNabb : I'd like to, but ostringstream does not allow this. It will not write to an external buffer. – galinette Oct 08 '14 at 22:23
  • [std::ostrstream](http://en.cppreference.com/w/cpp/io/ostrstream) will write to your buffer... – Cubbi Oct 09 '14 at 03:01
  • @Cubbi : that's deprecated!!! Moreover, it will write to a char *, not to a string. This means it will not expand it when needed leading to buffer overrun. This might be the reason why it is deprecated. – galinette Oct 09 '14 at 07:17
  • @galinette it will expand automatically unless you request a fixed-size output. Why do you think they can't remove it from the standard even though it was "deprecated" already in 1998? Of course it's not too hard to write your own streambuf with similar properties. – Cubbi Oct 09 '14 at 17:29

6 Answers6

16

This is now possible with C++20, with syntax like:

const std::string s = std::move(ss).str();

This is possible because the std::ostringstream class now has a str() overload that is rvalue-ref qualified:

basic_string<charT, traits, Allocator> str() &&;  // since C++20

This was added in P0408, revision 7, which was adopted into C++20.

This is the exact approach suggested by @MarcGlisse in a prescient comment from October 2014.

NicholasM
  • 4,557
  • 1
  • 20
  • 47
14

std::ostringstream offers no public interface to access its in-memory buffer unless it non-portably supports pubsetbuf (but even then your buffer is fixed-size, see cppreference example)

If you want to torture some string streams, you could access the buffer using the protected interface:

#include <iostream>
#include <sstream>
#include <vector>

struct my_stringbuf : std::stringbuf {
    const char* my_str() const { return pbase(); } // pptr might be useful too
};

int main()
{
    std::vector<float> v = {1.1, -3.4, 1/7.0};
    my_stringbuf buf;
    std::ostream ss(&buf);
    for(unsigned int i=0; i < v.size(); ++i)
        ss << v[i] << ' ';
    ss << std::ends;
    std::cout << buf.my_str() << '\n';
}

The standard C++ way of directly accessing an auto-resizing output stream buffer is offered by std::ostrstream, deprecated in C++98, but still standard C++14 and counting.

#include <iostream>
#include <strstream>
#include <vector>

int main()
{
    std::vector<float> v = {1.1, -3.4, 1/7.0};
    std::ostrstream ss;
    for(unsigned int i=0; i < v.size(); ++i)
        ss << v[i] << ' ';
    ss << std::ends;
    const char* buffer = ss.str(); // direct access!
    std::cout << buffer << '\n';
    ss.freeze(false); // abomination
}

However, I think the cleanest (and the fastest) solution is boost.karma

#include <iostream>
#include <string>
#include <vector>
#include <boost/spirit/include/karma.hpp>
namespace karma = boost::spirit::karma;
int main()
{
    std::vector<float> v = {1.1, -3.4, 1/7.0};
    std::string s;
    karma::generate(back_inserter(s), karma::double_ % ' ', v);
    std::cout << s << '\n'; // here's your string
}
Cubbi
  • 46,567
  • 13
  • 103
  • 169
  • 1
    +1 for the Karma approach of course. However, when Boost is in the picture, why not simply [use Boost Iostreams and have `ostream` write to a container or array transparently](http://stackoverflow.com/a/43856499/85371) :) – sehe May 08 '17 at 20:21
  • @sehe thanks, and yes, boost::iostreams::array_sink is certainly worth mentioning (after all, cppreference's page on [std::ostrstream](http://en.cppreference.com/w/cpp/io/ostrstream) mentions it) – Cubbi May 09 '17 at 02:40
  • 1
    A much simpler approach is now possible in C++20, as detailed in a [sibling answer](https://stackoverflow.com/a/66662433/1718575). – NicholasM Apr 04 '23 at 16:32
5

+1 for the Boost Karma by @Cubbi and the suggestion to "create your own streambuf-dervied type that does not make a copy, and give that to the constructor of a basic_istream<>.".

A more generic answer, though, is missing, and sits between these two. It uses Boost Iostreams:

using string_buf = bio::stream_buffer<bio::back_insert_device<std::string> >;

Here's a demo program:

Live On Coliru

#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/stream_buffer.hpp>

namespace bio = boost::iostreams;

using string_buf = bio::stream_buffer<bio::back_insert_device<std::string> >;

// any code that uses ostream
void foo(std::ostream& os) {
    os << "Hello world " 
       << std::hex << std::showbase << 42
       << " " << std::boolalpha << (1==1) << "\n";
}

#include <iostream>

int main() {
    std::string output;
    output.reserve(100); // optionally optimize if you know roughly how large output is gonna, or know what minimal size it will require

    {
        string_buf buf(output);
        std::ostream os(&buf);
        foo(os);
    }

    std::cout << "Output contains: " << output;
}

Note that you can trivially replace the std::string withstd::wstring, or std::vector<char> etc.

Even better, you can use it with the array_sink device and have a fixed-size buffer. That way you can avoid any buffer allocation whatsoever with your Iostreams code!

Live On Coliru

#include <boost/iostreams/device/array.hpp>

using array_buf = bio::stream_buffer<bio::basic_array<char>>;

// ...

int main() {
    char output[100] = {0};

    {
        array_buf buf(output);
        std::ostream os(&buf);
        foo(os);
    }

    std::cout << "Output contains: " << output;
}

Both programs print:

Output contains: Hello world 0x2a true
Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Added a fixed-array buffer example that works with anything that accepts `std::istream` or `std::ostream` – sehe May 08 '17 at 20:19
  • Can the string `output` be cleared at will? Or will this break the stream? – Lightness Races in Orbit May 08 '17 at 23:28
  • @BoundaryImposition Interesting question. If `back_insert_device` does do what the name suggests, that should be fine. I don't think I'd want to rely on that, since instantiating a new stream_buffer should not be expensive. – sehe May 09 '17 at 00:16
  • Is the reserve(100) important, or just speed optimization when output size can be determined? – AkariAkaori Oct 27 '17 at 01:05
  • @AkariAkaori it's only allocation optimization, as the documentation of `reserve` will confirm – sehe Oct 27 '17 at 09:30
4

I implemented "outstringstream" class, which I believe does exactly what you need (see take_str() method). I partially used code from: What is wrong with my implementation of overflow()?

#include <ostream>

template <typename char_type>
class basic_outstringstream : private std::basic_streambuf<char_type, std::char_traits<char_type>>,
                              public std::basic_ostream<char_type, std::char_traits<char_type>>
{
    using traits_type = std::char_traits<char_type>;
    using base_buf_type = std::basic_streambuf<char_type, traits_type>;
    using base_stream_type = std::basic_ostream<char_type, traits_type>;
    using int_type = typename base_buf_type::int_type;

    std::basic_string<char_type> m_str;

    int_type overflow(int_type ch) override
    {
        if (traits_type::eq_int_type(ch, traits_type::eof()))
            return traits_type::not_eof(ch);

        if (m_str.empty())
            m_str.resize(1);
        else
            m_str.resize(m_str.size() * 2);

        const std::ptrdiff_t diff = this->pptr() - this->pbase();
        this->setp(&m_str.front(), &m_str.back());

        this->pbump(diff);
        *this->pptr() = traits_type::to_char_type(ch);
        this->pbump(1);

        return traits_type::not_eof(traits_type::to_int_type(*this->pptr()));
    }

    void init()
    {
        this->setp(&m_str.front(), &m_str.back());

        const std::size_t size = m_str.size();
        if (size)
        {
            memcpy(this->pptr(), &m_str.front(), size);
            this->pbump(size);
        }
    }

public:

    explicit basic_outstringstream(std::size_t reserveSize = 8)
        : base_stream_type(this)
    {
        m_str.reserve(reserveSize);
        init();
    }

    explicit basic_outstringstream(std::basic_string<char_type>&& str)
        : base_stream_type(this), m_str(std::move(str))
    {
        init();
    }

    explicit basic_outstringstream(const std::basic_string<char_type>& str)
        : base_stream_type(this), m_str(str)
    {
        init();
    }

    const std::basic_string<char_type>& str() const
    {
        return m_str;
    }

    std::basic_string<char_type>&& take_str()
    {
        return std::move(m_str);
    }

    void clear()
    {
        m_str.clear();
        init();
    }
};

using outstringstream = basic_outstringstream<char>;
using woutstringstream = basic_outstringstream<wchar_t>;
Community
  • 1
  • 1
Kuba S.
  • 186
  • 2
  • 4
  • This is a good start, but shouldn't return a reference from `str()` and probably needs `xsputn()` and/or `sync()` overrides. I'm still working on it. – Lightness Races in Orbit May 05 '17 at 17:32
  • Ok - no need for `xsputn()` or `sync()`, but your use of `&m_str.front()` and `&m_str.back()` in `init()` is broken; this has UB when the string is empty. With GCC 4.8.5, `&m_str.front()` is one _after_ `&m_str.back()` in this case!! Then [`streamsize` in `xsputn()`](https://gcc.gnu.org/onlinedocs/gcc-4.8.5/libstdc++/api/a01267_source.html) is -1 (rather than 0) and all hell breaks loose. `&m_str[0]` and `&m_str[m_str.size()]` should work (even when the latter is one-past-the-end; an impl kinda _has_ to work that way in C++11). – Lightness Races in Orbit May 08 '17 at 18:59
  • Frankly a `vector` would be much safer all around (especially when you risk COW being in play, *cough* GCC), but it's not as useful to whoever's calling `take_str()`. – Lightness Races in Orbit May 08 '17 at 18:59
  • I reckon a call to `setp` in `str()` (between string copy and return) should finish the job. Here's my current implementation, in case you're interested and/or want to incorporate my changes: https://pastebin.com/jLZ3TF3b – Lightness Races in Orbit May 08 '17 at 19:06
  • Eesh, remove the silly (and broken) `xsputn()` I left in there by mistake ;) – Lightness Races in Orbit May 08 '17 at 20:47
  • @LightnessRacesinOrbit reference from str() seems to me a good idea and i would rather move to a C++11 version that guarantees non COW string – ceztko Jul 10 '19 at 11:04
  • @LightnessRacesinOrbit the code is already C++11 dependent, so I don't think your version of `str()` taking care of COW in pastebin is really needed, unless you are using non compliant C++11 compiler. I agree that the use of `front()` and `back()` looks a bit fishy, but for that I would rather use `basic_string::data()` and pointer arithmetic – ceztko Jul 10 '19 at 11:42
  • 1
    @ceztko [libstdc++ even in C++11 mode had COW strings for several years](https://stackoverflow.com/questions/12199710/legality-of-cow-stdstring-implementation-in-c11#comment35874690_12199710) (and, yes, this was non-compliant). It's fine since GCC 5, though. In reality I'd be tempted to use a compile-time check to build out that added hack for compliant toolchains. – Lightness Races in Orbit Jul 10 '19 at 11:43
  • @LightnessRacesinOrbit Ok, returning reference to string was not good also because internal string doesn't have correct buffer content size. This was not correct also in your code in pastebin. Since Kuba is unresponsive, I posted a new [answer](https://stackoverflow.com/a/56978001/213871) with this fix and other fixes improvements. We can continue discussion there if you have other contributions. – ceztko Jul 10 '19 at 20:20
1

Update: In the face of people's continued dislike of this answer, I thought I'd make an edit and explain.

  1. No, there is no way to avoid a string copy (stringbuf has the same interface)

  2. It will never matter. It's actually more efficient that way. (I will try to explain this)

Imagine writing a version of stringbuf that keeps a perfect, moveable std::string available at all times. (I have actually tried this).

Adding characters is easy - we simply use push_back on the underlying string.

OK, but what about removing characters (reading from the buffer)? We'll have to move some pointer to account for the characters we've removed, all well and good.

However, we have a problem - the contract we're keeping that says we'll always have a std::string available.

So whenever we remove characters from the stream, we'll need to erase them from the underlying string. That means shuffling all the remaining characters down (memmove/memcpy). Because this contract must be kept every time the flow of control leaves our private implementation, this in practice means having to erase characters from the string every time we call getc or gets on the string buffer. This translates to a call to erase on every << operation on the stream.

Then of course there's the problem of implementing the pushback buffer. If you pushback characters into the underlying string, you've got to insert them at position 0 - shuffling the entire buffer up.

The long and short of it is that you can write an ostream-only stream buffer purely for building a std::string. You'll still need to deal with all the reallocations as the underlying buffer grows, so in the end you get to save exactly one string copy. So perhaps we go from 4 string copies (and calls to malloc/free) to 3, or 3 to 2.

You'll also need to deal with the problem that the streambuf interface is not split into istreambuf and ostreambuf. This means you still have to offer the input interface and either throw exceptions or assert if someone uses it. This amounts to lying to users - we've failed to implement an expected interface.

For this tiny improvement in performance, we must pay the cost of:

  1. developing a (quite complex, when you factor in locale management) software component.

  2. suffering the loss of flexibility of having a streambuf which only supports output operations.

  3. Laying landmines for future developers to step on.

Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • 6
    "String copies on a modern cpu are extremely cheap" Are they? What if my program needs to parse a few gigabytes of text data? (Sometimes it does) – Neil Kirk Oct 08 '14 at 21:42
  • 2
    The parsing will take longer than the copying, by a huge factor. – Richard Hodges Oct 08 '14 at 21:44
  • Consider the following test code: http://pastebin.com/YYtT6VwH In release they are the same speed actually, but in debug mode (which I need to use too) f1 is nearly twice as fast. – Neil Kirk Oct 08 '14 at 22:16
  • @Neil : yes I'm working on somewhat large lists (several tensmegabytes), file I/O takes about 30s and profiling shows I'm constructing strings all the time. – galinette Oct 08 '14 at 22:20
  • 1
    @galinette If speed is very important, unfortunately, you have to use C parsing. It's not a popular fact, but it is faster. – Neil Kirk Oct 08 '14 at 22:27
  • Same for parsing floats out of a large text file... It seems that c++ does not allow doing this without having two copies of the text data in memory at the same time at one point of the code – galinette Oct 08 '14 at 22:32
  • what about `double d; stream >> d;` ? – Richard Hodges Oct 09 '14 at 08:42
  • @Richard : Sorry I meant parsing floats out of a large string (not file). You cannot make a stream for parsing the string without duplicating the whole string in memory – galinette Oct 09 '14 at 12:52
  • @galinette You can create your own streambuf-dervied type that does not make a copy, and give that to the constructor of a basic_istream<>. This gives you utility with non-copying efficiency (if that is truly important). Again though, I would reiterate that parsing ascii characters to doubles is a lot more expensive than merely copying the string containing the ascii characters. If you want efficiency, you might want to avoid the need for string-double conversions until the point of absolute necessity. i.e. the point at which the strings enter/leave the library/program. – Richard Hodges Oct 09 '14 at 13:28
  • 2
    @NeilKirk If your program really is copying a gigabytes of string data then there are a number of techniques for iterating over the data without reading it all into memory. Memory-mapped files, converting direct from the input stream, batch processing, not converting (store in binary format) etc etc. Efficiency is almost always a problem of choosing the correct algorithm, not optimising an existing algorithm. – Richard Hodges Oct 09 '14 at 13:34
  • I know it was just an example. The point is minimizing string copies is a good idea. It might not always matter, but sometimes it does. Also I can't be bothered making my own streams and allocators for something that should be provided by the language automatically. If speed is critical, it's back to C parsing for me, unfortunately. – Neil Kirk Oct 09 '14 at 13:36
  • `fscanf` is still faster than `stream >> d;` for huge data. – Neil Kirk Oct 09 '14 at 13:50
  • That's probably true, but what you gain in speed you pay in safety. let the buyer beware :-) – Richard Hodges Oct 09 '14 at 16:52
  • @NeilKirk C library's parsers/formatters are very slow too (relative to special-purpose libraries that don't have to honor locales). – Cubbi Oct 10 '14 at 04:14
  • @Cubbi Yep we have our own parser for hex-only data. – Neil Kirk Oct 10 '14 at 09:27
  • @RichardHodges I like the remarks about fallacies lurking there. However, output-only streams into a pre-allocated buffer is are obviously very useful. How do you like the ~4-line approach¹ for that in [my answer](http://stackoverflow.com/a/43856499/85371) (¹ using Boost)? – sehe May 08 '17 at 20:42
  • @sehe exactly what I would do. boost iostreams is awesome (but the documentation sucks!) – Richard Hodges May 08 '17 at 21:10
  • @RichardHodges Not sure about "awesome" (I think it has the most crippling design warts of all boost libraries) but these building blocks are pretty functional indeed! – sehe May 08 '17 at 21:19
0

I adapted the very good @Kuba answer to fix some issues (unfortunately he's currently unresponsive). In particular:

  • added a safe_pbump to handle 64 bit offsets;
  • return a string_view instead of string (internal string doesn't have the right size of the buffer);
  • resize the string to current buffer size on the move semantics take_str method;
  • fixed take_str method move semantics with init before return;
  • removed a useless memcpy on init method;
  • renamed the template parameter char_type to CharT to avoid ambiguity with basic_streambuf::char_type;
  • used string::data() and pointer arithmetic instead of possible undefined behavior using string::front() and string::back() as pointed by @LightnessRacesinOrbit;
  • Implementation with streambuf composition.
#pragma once

#include <cstdlib>
#include <limits>
#include <ostream>
#include <string>
#if __cplusplus >= 201703L
#include <string_view>
#endif

namespace usr
{
    template <typename CharT>
    class basic_outstringstream : public std::basic_ostream<CharT, std::char_traits<CharT>>
    {
        using traits_type = std::char_traits<CharT>;
        using base_stream_type = std::basic_ostream<CharT, traits_type>;

        class buffer : public std::basic_streambuf<CharT, std::char_traits<CharT>>
        {
            using base_buf_type = std::basic_streambuf<CharT, traits_type>;
            using int_type = typename base_buf_type::int_type;

        private:
            void safe_pbump(std::streamsize off)
            {
                // pbump doesn't support 64 bit offsets
                // https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47921
                int maxbump;
                if (off > 0)
                    maxbump = std::numeric_limits<int>::max();
                else if (off < 0)
                    maxbump = std::numeric_limits<int>::min();
                else // == 0
                    return;

                while (std::abs(off) > std::numeric_limits<int>::max())
                {
                    this->pbump(maxbump);
                    off -= maxbump;
                }

                this->pbump((int)off);
            }

            void init()
            {
                this->setp(const_cast<CharT *>(m_str.data()),
                    const_cast<CharT *>(m_str.data()) + m_str.size());
                this->safe_pbump((std::streamsize)m_str.size());
            }

        protected:
            int_type overflow(int_type ch) override
            {
                if (traits_type::eq_int_type(ch, traits_type::eof()))
                    return traits_type::not_eof(ch);

                if (m_str.empty())
                    m_str.resize(1);
                else
                    m_str.resize(m_str.size() * 2);

                size_t size = this->size();
                this->setp(const_cast<CharT *>(m_str.data()),
                    const_cast<CharT *>(m_str.data()) + m_str.size());
                this->safe_pbump((std::streamsize)size);
                *this->pptr() = traits_type::to_char_type(ch);
                this->pbump(1);

                return ch;
            }

        public:
            buffer(std::size_t reserveSize)
            {
                m_str.reserve(reserveSize);
                init();
            }

            buffer(std::basic_string<CharT>&& str)
                : m_str(std::move(str))
            {
                init();
            }

            buffer(const std::basic_string<CharT>& str)
                : m_str(str)
            {
                init();
            }

        public:
            size_t size() const
            {
                return (size_t)(this->pptr() - this->pbase());
            }

#if __cplusplus >= 201703L
            std::basic_string_view<CharT> str() const
            {
                return std::basic_string_view<CharT>(m_str.data(), size());
            }
#endif
            std::basic_string<CharT> take_str()
            {
                // Resize the string to actual used buffer size
                m_str.resize(size());
                std::string ret = std::move(m_str);
                init();
                return ret;
            }

            void clear()
            {
                m_str.clear();
                init();
            }

            const CharT * data() const
            {
                return m_str.data();
            }

        private:
            std::basic_string<CharT> m_str;
        };

    public:
        explicit basic_outstringstream(std::size_t reserveSize = 8)
            : base_stream_type(nullptr), m_buffer(reserveSize)
        {
            this->rdbuf(&m_buffer);
        }

        explicit basic_outstringstream(std::basic_string<CharT>&& str)
            : base_stream_type(nullptr), m_buffer(str)
        {
            this->rdbuf(&m_buffer);
        }

        explicit basic_outstringstream(const std::basic_string<CharT>& str)
            : base_stream_type(nullptr), m_buffer(str)
        {
            this->rdbuf(&m_buffer);
        }

#if __cplusplus >= 201703L
        std::basic_string_view<CharT> str() const
        {
            return m_buffer.str();
        }
#endif
        std::basic_string<CharT> take_str()
        {
            return m_buffer.take_str();
        }

        const CharT * data() const
        {
            return m_buffer.data();
        }

        size_t size() const
        {
            return m_buffer.size();
        }

        void clear()
        {
            m_buffer.clear();
        }

    private:
        buffer m_buffer;
    };

    using outstringstream = basic_outstringstream<char>;
    using woutstringstream = basic_outstringstream<wchar_t>;
}
ceztko
  • 14,736
  • 5
  • 58
  • 73