0

Suppose we have std::string s holding a raw data buffer, but we want std::vector<uint8_t> v instead. The buffer length is in the millions. Is there a simple way to let v steal s's memory and thereby avoid copying the buffer?

Like what std::vector<uint8_t>::vector(std::string&&) would have done, but doing it somehow from the outside of STL.

Alternatively, is it possible to get v from std::stringstream ss with an operation about as efficient as ss.str()?

Museful
  • 6,711
  • 5
  • 42
  • 68
  • 4
    If the vector and string both have a custom allocator you might be able to do something like that – DarthRubik Jun 23 '18 at 20:17
  • 2
    Why not just use a vector as a buffer in the first place? –  Jun 23 '18 at 20:18
  • @NeilButterworth Based on the last bit, presumably because the data is in an `std::stringstream` and it's hard to turn one of those into an `std::vector`. – Daniel H Jun 23 '18 at 20:25
  • @NeilButterworth Because it comes from `stringstream< – Museful Jun 23 '18 at 20:25
  • 2
    @Museful Why not just use `fread`? – Daniel H Jun 23 '18 at 20:26
  • @DanielH It's in a library I wrote a long time ago. I probably didn't want to contaminate my beautiful code with C. – Museful Jun 23 '18 at 20:30
  • Can you not change the part used to get the `stringstream`, only what you do with it after that? I'm pretty sure you could create a vector of the right size, then either `fread` into it or use `pubsetbuf` from the result of `rdbuf` to read into it. – Daniel H Jun 23 '18 at 20:38
  • Or use std::istream::read(). –  Jun 23 '18 at 20:39
  • @NeilButterworth Or that, if you want to do it the reasonable way instead of the overcomplicated way I came up with for because I don't do much IO stuff. – Daniel H Jun 23 '18 at 20:43
  • @NeilButterworth But then I have to first determine the size of the stream. Is there a trivial way to do that? – Museful Jun 23 '18 at 21:14
  • Depends where you are consuming the stream from. –  Jun 23 '18 at 21:16
  • Why can't you just keep the data in a `std::string`? – Brian Bi Jun 23 '18 at 21:18
  • @NeilButterworth A file. I just want to read a full file into memory, and I remember that `(stringstream< – Museful Jun 23 '18 at 21:21
  • 2
    Oh, then it's pretty easy - get the length of the file - see https://stackoverflow.com/questions/5840148/how-can-i-get-a-files-size-in-c, allocate a suitable sized vector and read into it. But I think you are obsessing too much about so-called "intent". strings and vectors of bytes are practically equivalent. –  Jun 23 '18 at 21:24
  • @Brian Because I have to pass it around and I'm getting tired of pretending the arguments to my functions are `std::string` when they are actually binary data buffers. – Museful Jun 23 '18 at 21:25
  • 1
    std::strings deal with "binary" data just fine. –  Jun 23 '18 at 21:26
  • In the worst case you could implement `vecbuf`, or find an implementation of it, in a way similar to [`basic_stringbuf`](https://en.cppreference.com/w/cpp/io/basic_stringbuf), but I think that would be going too far. – Daniel H Jun 23 '18 at 21:26
  • Unless you need to insert at the end of the buffer after reading the file, there isn't much advantage to a `std::string` or `std::vector` over just a `char*`, as long as you keep the string around somewhere so there aren't lifetime issues (which you would need to do anyway). – Daniel H Jun 23 '18 at 21:29
  • @NeilButterworth _get the length of the file..._ There might be a (minor) issue on Windows since one normally wants linebreaks translated by `fread`. – Paul Sanders Jun 24 '18 at 16:31
  • @Paul He says he is reading the data as binary, not text, so he doesn't want them translated. –  Jun 24 '18 at 16:33
  • @Neil Quite right, sorry. On that basis, posted some code. – Paul Sanders Jun 24 '18 at 17:05
  • this sounded similar to your [other post](https://stackoverflow.com/questions/51014963/why-doesnt-stdstringstreamstringstreamstdstring-exist/51015120#51015120) – Joseph D. Jun 25 '18 at 14:55

1 Answers1

0

OK, lots of comments there, let's try and put something together, because I need the practise and might maybe earn some points [update: didn't :(].

I'm fairly new to 'modern C++' so please take it as you find it. Might need as late as C++17, I haven't checked that too carefully. Any critique more than welcome but I would prefer to edit my own post. And please bear in mind when reading this that what the OP actually wants to do is read his bytes from a file. Thx.

Update: Tweaked to handle the case where the the file size changes between the call to stat() and the call to fread() as per @Deduplicator's comment below ... and subsequently replaced fread with std::ifstream, I think we're there now.

#include <string>
#include <vector>
#include <optional>
#include <iostream>
#include <fstream>

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>

using optional_vector_of_char = std::optional <std::vector <char>>;

// Read file into a std::optional <std::vector <char>>.
// Now handles the file size changing when we're not looking.
optional_vector_of_char SmarterReadFileIntoVector (std::string filename)
{
    for ( ; ; )
    {
        struct stat stat_buf;
        int err = stat (filename.c_str (), &stat_buf);
        if (err)
        {
            // handle the error
            return optional_vector_of_char ();   // or maybe throw an exception
        }

        size_t filesize = stat_buf.st_size;

        std::ifstream fs;
        fs.open (filename, std::ios_base::in | std::ios_base::binary);
        if (!fs.is_open ())
        {
            // handle the error
            return optional_vector_of_char ();
        }

        optional_vector_of_char v (filesize + 1);
        std::vector <char>& vecref = v.value ();
        fs.read (vecref.data (), filesize + 1);

        if (fs.rdstate () & std::ifstream::failbit)
        {
            // handle the error
            return optional_vector_of_char ();
        }

        size_t bytes_read = fs.gcount ();
        if (bytes_read <= filesize)              // file same size or shrunk, this code handles both
        {
            vecref.resize (bytes_read);
            vecref.shrink_to_fit ();
            return v;                            // RVO
        }

        // File has grown, go round again
    }
}    

int main ()
{
    optional_vector_of_char v = SmarterReadFileIntoVector ("abcde");
    std::cout << std::boolalpha << v.has_value () << std::endl;
}

Live demo. No actual file available to read in of course, so...


Also: Have you considered writing your own simple container that maps a view of the file? Just a thought.

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
  • There is an obvious TOCTOU bug as the file could change length between stat and open. Also, why don't you want to read as binary on non-Windows, as that seems to be the right thing according to your comment? – Deduplicator Jun 24 '18 at 20:27
  • @Deduplicator Indeed, silly me, although it doesn't sound like it applies to the OP. I have modified the code accordingly. As for point 2, there's no difference between binary files and text files on platforms other than Windows because there's no need to translate line endings. Specifying "rb"or "rt") for the mode parameter is a Windows extension. – Paul Sanders Jun 25 '18 at 09:27
  • `b` is not a windows extension, even though it does nothing on Unix, there are earlier systems where binary files and text files were different. See also https://en.cppreference.com/w/c/io/fopen – Deduplicator Jun 25 '18 at 10:43
  • @Deduplicator OK, I see there _File access mode flag "b" can optionally be specified to open a file in binary mode. This flag has no effect on POSIX systems, but on Windows it disables special handling of '\n' and '\x1A'._ I was concerned that some platforms might object to `"rb"`. Anyway, moot now, I changed the code to use `ifstream`, which is better all round. – Paul Sanders Jun 25 '18 at 11:30