32

this should be pretty common yet I find it fascinating that I couldn't find any straight forward solution.

Basically I read in a file over the network into a stringstream. This is the declaration:

std::stringstream membuf(std::ios::in | std::ios::out | std::ios::binary);

Now I have some C library that wants direct access to the read chunk of the memory. How do I get that? Read only access is OK. After the C function is done, I dispose of the memorystream, no need for it.

str() copies the buffer, which seems unnecessary and doubles the memory.

Am I missing something obvious? Maybe a different stl class would work better.

Edit: Apparently, stringstream is not guaranteed to be stored continuously. What is?

if I use vector<char> how do I get byte buffer?

Kugel
  • 19,354
  • 16
  • 71
  • 103
  • Since `vector` stores all of its elements contiguously, you can get the "buffer" as follows: `char* buffer = &vector_char.front();` – Steve Guidi Dec 09 '09 at 23:35
  • 1
    @Steve Guidi: Unless you know how many bytes are going to come in off the network, a `vector` may make more copies than a `std:stringstream` because it _has_ to reallocate and copy its data as your appending to it to keep the storage contiguous. – CB Bailey Dec 10 '09 at 06:14
  • Does this help: http://stackoverflow.com/questions/132358/how-to-read-file-content-into-istringstream/138645#138645 – Martin York Dec 10 '09 at 06:43
  • @Kugel: If you're waiting for stuff across a network then one (or even two) local memory copies would usually disappear into background noise as far as performance measurements go. Have you measured `std::stringstream` performance and found it to be inadequate and if so what are your performance requirements? – CB Bailey Dec 10 '09 at 13:37
  • @Charles Performance is not that crutial. I just wanted and easy way to access downloaded buffer without flushing to HD. – Kugel Dec 10 '09 at 14:00
  • 1
    In that case I'd just extract a string from the `std::stringstream` and pass `.c_str()` to the function needing the read-only contiguous buffer. At two lines of code you're not going to get much simpler. – CB Bailey Dec 10 '09 at 15:21
  • @Steve Just a heads up: NEVER use `std::vector::front()` to access the underlying data. It may work, but is never portable. Use `&vector[0]` instead, that will ALWAYS do what you want. – rioki Feb 22 '10 at 13:40

4 Answers4

14

You can take full control of the buffer used by writing the buffer yourself and using that buffer in the stringstream

stringstream membuf(std::ios::in | std::ios::out | std::ios::binary);
membuf.rdbuf(yourVeryOwnStreamBuf);

Your own buffer should be derived from basic_streambuf, and override the sync() and overflow() methods appropriately.

For your internal representation you could probably use something like vector< char >, and reserve() it to the needed size so that no reallocations and copies are done.

This implies you know an upper bound for the space needed in advance. But if you don't know the size in advance, and need a continguous buffer in the end, copies are of course unavoidable.

Pieter
  • 17,435
  • 8
  • 50
  • 89
  • 1
    realloc allows you to grow a buffer without copying by attempting to grow the allocation in-place first. realloc often moves data, but not 100% of the time. i've found it's far more efficient than "copy always". rather than deriving from vector, just use a char * and realloc as the base for your custom stream class. – Erik Aronesty Apr 24 '15 at 13:55
  • I will appear to be lazy, but this is a question that would deserve an answer with the complete snippet as an answer with exactly the concept described here. I inherited `basic_streambuf` once and I won't do it one more time, sorry. – ceztko Feb 27 '18 at 17:00
12

std::stringstream doesn't (necessarily) store its buffer contiguously but can allocate chunks as it is gradually filled. If you then want all of its data in a contiguous region of memory then you will need to copy it and that is what str() does for you.

Of course, if you want to use or write a class with a different storage strategy then you can, but you don't then need to use std::stringstream at all.

CB Bailey
  • 755,051
  • 104
  • 632
  • 656
  • Charles, can you help me? I know that I can get the underlying stringbuf used by the stringstream with 'rdbuf()'. And I cannot find wording in the Standard saying that the stringbuf/stringstream can be non-contiguous. I'd appreciate any pointers you can give. Thanks. – Don Wakefield Dec 10 '09 at 00:02
  • 1
    Where does it say that `stringbuf` has to use contiguous storage? The standard requires it to store an underlying character sequence but doesn't specify how. The `streambuf` interface has `overflow` and `underflow` so doesn't require the stream to be made available as a single contiguous range and the only other requirements are the constructor from `std::string` and the `str` overloads all of which deal with copies of the underlying character sequence. I'd be (mildly) disappointed with any implementation that always used contiguous storage as the interface is designed for incremental appends. – CB Bailey Dec 10 '09 at 06:07
  • 1
    rdbuf works fine in all the test cases i've run... but if i were you i'd assert on some contiguity test (length of the rdbuf = number of chars in the stream) before using it. stick that in your test suite and you're good to go for platform porting, etc. – Erik Aronesty May 04 '15 at 17:30
9

You can call str() to get back a std::string. From there you can call c_str() on the std::string to get a char*. Note that c_str() isn't officially supported for this use, but everyone uses it this way :)

Edit

This is probably a better solution: std::istream::read. From the example on that page:

  buffer = new char [length];

  // read data as a block:
  is.read (buffer,length);
Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
  • 11
    Man, thats tripling the buffer. – Kugel Dec 09 '09 at 23:11
  • 1
    This however works. I will mark it as an answer if there is no better way in STL. – Kugel Dec 09 '09 at 23:23
  • 4
    `c_str()` is part of the standard interface of `std::basic_string`. Apart from the minor detail that it returns a `const char*` (and the question on asks for read-only access anyway), how is this usage of `c_str` not supported? – CB Bailey Dec 10 '09 at 06:11
  • 3
    When you say "everyone uses it this way", I think you used the wrong word - you meant "no-one". –  Dec 10 '09 at 10:31
  • regarding edit, I don't know the length beforehand to allocate the buffer. – Kugel Dec 10 '09 at 11:46
  • Actually I can count length by summing the incoming packets during streaming from network. – Kugel Dec 10 '09 at 14:28
  • @Kugel: Doesn't this imply that you have to store all the packets somewhere before sending them to the `stringstream`? In this case wouldn't you be better of just creating one big buffer and copying the data directly from the stored packets rather than using a `stringstream` intermediate? – CB Bailey Dec 10 '09 at 15:19
  • 1
    @Charles - Your right. I thought that because it was the internal representation of the string you aren't suppose to use it. But I checked the SGI docs (http://www.sgi.com/tech/stl/basic_string.html) and it doesn't say anything like that. @Neil - "no-one" must include me and many other people I've worked with in the past. Using const char* is bad form in C++, but a lot of C++ code either has to interface with C libraries or was written by someone who knew C and was trying to code in C++. This seemed to be extremely common to me. –  Dec 10 '09 at 16:28
  • I am not sure which str() function you folks are referring to. When I tried const char *data = stringstream.rdbuf()->str().c_str(), the data was around for a while but went away on stack and heap usage. Looks like you must do string data = stringstream.rdbuf()->str() to have the data stick around. Since I was using a binary strinstream I went to the stringstream.read method. – gjpc Sep 22 '11 at 20:52
5

Well, if you are seriously concerned about storage, you can get closer to the metal. basic_stringstream has a method, rdbuf() which returns it's basic_stringbuf (which is derived from basic_streambuf). You can then use the eback(), egptr(), and gptr() pointers to access characters directly out of the buffer. I've used this machinery in the past to implement a custom buffer with my desired semantics, so it is do-able.

Beware, this is not for the faint of heart! Set aside a few days, read Standard C++ IOStreams and Locales, or similar nitpicky reference, and be careful...

Don Wakefield
  • 8,693
  • 3
  • 36
  • 54
  • 2
    `eback`, `egptr` and `gptr` are protected so obtaining the `rdbuf` pointer for the `stringbuf` isn't going to give you access to these. Even if it did, I don't think that it's _guaranteed_ that you could create a contiguous copy of the underlying sequence with fewer copies that `.str()` and `.data()`. – CB Bailey Dec 10 '09 at 06:09
  • Okay. Since I was implementing my own derived buffer, I could access the methods. And I guess G++ just does contiguous access. I still couldn't find this restriction (non-contiguous) in the standard... – Don Wakefield Dec 10 '09 at 18:49
  • Thanks for pointing into gptr() however by default they are protected members: https://cplusplus.com/reference/streambuf/streambuf/egptr/ – Konstantin Burlachenko Nov 21 '22 at 09:07
  • 1
    Hi @KonstantinBurlachenko. CB Baily already mentioned that, and I responded. See the previous two comments. Thanks for the pointer to a reference, though. – Don Wakefield Nov 21 '22 at 18:31