0

There are two ways that I know of to read file contents to a C-style string. The reason I need a C-style string and not a unique_ptr<string> is because I will be passing this data to a C function, gumbo_parse() whose parameter is a C-style string, and this way I avoid the overhead of a unique_ptr<string>. I am open to arguments against my decision.

std::ifstream ifs(bf::path("..."));                                              // Input file stream

First way:

std::filebuf * ptr_fbuff = ifs.rdbuf();                                         // Buffer to file contents.
std::size_t fbuff_size = ptr_fbuff->pubseekoff(0, ifs.end, std::ios_base::in);  // Get offset from 0 to end of buffer.
ptr_fbuff->pubseekpos(0, std::ios_base::in);                                    // Move buffer position back to beginning.
char * buffer = new char[fbuff_size];                                           // Buffer to store file data.
ptr_fbuff->sgetn(buffer, fbuff_size);                                           // Get file data.

Second way:

std::stringstream sstm_buffer;                                                  // Buffer to file contents.
sstm_buffer << ifs.rdbuf();                                                     // Read file contents to sstm.
const std::string & str_buffer = sstm_buffer.str();                             // Get underlying string from sstm.
char * buffer = new char[str_buffer.size()];                                    // Buffer to store file data.
str_buffer.copy(buffer, str_buffer.size());                                     // Copy file contents to buffer.

Do stuff:

GumboOutput * ptr_outpt = gumbo_parse(buffer);
//...

Close file:

gumbo_destroy_output(&kGumboDefaultOptions, ptr_outpt);
ifs.close();                                                                    // Close stream to file.
delete[] buffer;                                                                // Delete allocated buffer.

What are the differences between the two in terms of memory cost and speed. Clearly one uses a Stringstream as a buffer before putting the contents into the C-style string, and the other uses a filebuf before putting the contents into a C-style string.

Now to my second question. Ownership semantics and memory allocation. Does the ifstream allocate any memory in the heap for the buffer returned from rdbuf? Am I responsible for deleting the buffer returned from rdbuf? If it doesn't, say I'm reading a huge file... isn't that potentially a LOT of data to be storing in the stack?

Edit:

std::unique_ptr<std::string> mytype::my_func(const std::string & path)
{
    std::ifstream ifs(path);                                            // Input file stream
    std::stringstream sstm_buffer;                                      // Buffer to file contents.
    sstm_buffer << ifs.rdbuf();                                         // Read file contents to sstm.
    ifs.close();                                                        // Close stream to file.
    auto ptr_buffer = std::make_unique<std::string>(sstm_buffer.str()); // Pointer to copy of sstm buffer contents.
    return std::move(ptr_buffer);                                       // Return pointer.
}

Edit2:

std::string mytype::my_func(const std::string & path) const
{
    std::ifstream ifs(path);
    std::stringstream sstm_buf;
    sstm_buf << ifs.rdbuf();
    ifs.close();
    return sstm_buf.str();
}

Edit3:

std::string psclient::file_get(const std::string & path) const
{
    std::ifstream ifs(path);    // Stream to file (Automatically close() on ~).
    std::ostringstream reader;  // Stream to read file contents.
    reader << ifs.rdbuf();      // Read in file contents.
    return reader.str();        // Return a move instead of copy (Done implicitly).
}
Francisco Aguilera
  • 3,099
  • 6
  • 31
  • 57
  • What "overhead" are you trying to avoid exactly? A `unique_ptr` basically just destroys an object when it goes out of scope, which you need to do regardless. – David Schwartz Mar 26 '15 at 23:12
  • 1
    The function takes a C-style pointer, so there would be no point of using a unique_ptr as I would have to release it to give the parser a copy of the underlying string's c_str(). So right there instead of giving it the char * directly I would have a useless unique_ptr with a string, whose c_str() buffer I would need to copy. – Francisco Aguilera Mar 26 '15 at 23:14
  • 1
    Does the function take ownership of the string? If not, there's no overhead and no need to copy in any case. – David Schwartz Mar 26 '15 at 23:15
  • 2
    @FranciscoAguilera: Why a copy of the strings c_str()? Why not just give it the c_str()? – Mooing Duck Mar 26 '15 at 23:16
  • 2
    If you're considering using `new char[]`, then clearly the function you're passing the string to doesn't need to manage the lifetime of the string. Your code is going to do that. So why not use a `unique_ptr` to do that -- that's what it's for. There's no overhead and no need to copy anything. Then `unique_ptr` is just a convenient RAII wrapper to put the `delete` in the right place for you automatically. – David Schwartz Mar 26 '15 at 23:19
  • @DavidSchwartz The api I linked for the parser says "The buffer must live at least as long as the parse tree, as some fields (eg. original_text) point directly into the original buffer." So clearly, the parser does not take ownership. The files I read are huge so I did not think the stack was an appropriate place to store them. – Francisco Aguilera Mar 26 '15 at 23:22
  • @DavidSchwartz, how would you pass a unique_ptr to the `gumbo_parse()` function? You would need to call release, invalidating the point of using a unique_ptr in the first place? – Francisco Aguilera Mar 26 '15 at 23:24
  • 1
    @FranciscoAguilera No, you don't call `release`. You just use the pointer to call `c_str` on the underlying `string`. *You* are managing the lifetime of the object -- why would you want to call `release`?! – David Schwartz Mar 26 '15 at 23:25
  • @MooingDuck, it is my understanding that `c_str()` returns a temporary copy of the contents of a `std::string`, so if I passed it to the function, there's a chance that the temporary may not be available later down. – Francisco Aguilera Mar 26 '15 at 23:25
  • 1
    @FranciscoAguilera Right, that's why *you are managing the lifetime*. That's why you *wouldn't* call `release`. The function you are calling does *not* take ownership, you keep it. Using a `unique_ptr` makes it easy for you to precisely define the point where it becomes invalid -- you use the scope of the `unique_ptr`. – David Schwartz Mar 26 '15 at 23:25
  • @DavidSchwartz ok, I wouldn't have to call release then, I would just give the function the `&(foo.c_str())`, but then my other comment stands. "it is my understanding that c_str() returns a temporary copy of the contents of a std::string, so if I passed it to the function, there's a chance that the temporary may not be available later down." – Francisco Aguilera Mar 26 '15 at 23:27
  • 1
    @FranciscoAguilera Right, so you manage the lifetime to ensure it stays valid. A `unique_ptr` is a great way to do that. (No `&` though, just `foo.c_str()`.) – David Schwartz Mar 26 '15 at 23:27
  • 1
    @FranciscoAguilera: The pointer returned by `c_str()` remains valid as long as the source string is valid, and you don't call any non-const member functions on it. It also doesn't generally involve copying any data, it just refers directly to the string contents (which is why it can be invalidated by modifying operations on the string). – Benjamin Lindley Mar 26 '15 at 23:33
  • 2
    I fail to understand why a pointer (smart or otherwise) is even necessary here... – Julian Mar 26 '15 at 23:33
  • @AtlasC1, see the definition linked for `gumbo_parse()`. – Francisco Aguilera Mar 26 '15 at 23:40
  • 1
    @FranciscoAguilera Indeed, it accepts a `const char*`, which is exactly what `std::string::c_str()` returns. Making your string a pointer is superfluous. – Julian Mar 26 '15 at 23:42
  • @AtlasC1, well, like I mentioned in my post, I don't feel comfortable holding all the data from the file on the stack, so in order to place it in the heap, you need a pointer. Hence, `std::unique_ptr`. I'm still confused tho as to whether the ifstream is reading all of the file contents onto a buffer in the stack since my `std::stringstream` is on the stack. – Francisco Aguilera Mar 26 '15 at 23:47
  • The string object itself is on the stack, but `std::string` allocates its internal memory used to store the character data on the heap by default. – Julian Mar 26 '15 at 23:50
  • @AtlasC1 it does???? News to me. What about `std::stringstream`? See the edit on my post. – Francisco Aguilera Mar 26 '15 at 23:51
  • @AtlasC1 Also, in order to return the string from the function without making a copy I would need to allocate it in the heap, no? – Francisco Aguilera Mar 26 '15 at 23:56
  • Return value optimization should take care of that for you. Just `return sstm_buffer.str()` :) – Julian Mar 27 '15 at 00:01
  • @AtlasC1 return value optimiwhaaat :) So, in my second edit above, you're telling me a copy wont be made in the return value? Oh, do you mean like [here](http://stackoverflow.com/questions/4316727/returning-unique-ptr-from-functions), the compiler will move the contents implicitly instead of making a copy? – Francisco Aguilera Mar 27 '15 at 00:04
  • Basically, yes. Your second edit above looks good, although you may omit `ifs.close();`, because the file stream will automatically be closed when its destructor is called (i.e., when your function returns). – Julian Mar 27 '15 at 00:23
  • @AtlasC1 alright, last edit should be correct, then. Learned a lot thanks again. You should try to compile this discussion into an answer and i'll mark it for you. – Francisco Aguilera Mar 27 '15 at 00:38
  • @AtlasC1 There's not much distinction here between using a `shared_ptr` as the RAII container or the object itself. I was mostly addressing the erroneous objection that a reason not to use a `shared_ptr` (over some other form of dynamic allocation) is mythical "overhead" from the RAII container. – David Schwartz Mar 27 '15 at 05:53

2 Answers2

2

The reason I need a C-style string and not a unique_ptr is because I will be passing this data to a C function

You can still get a C-style pointer, for example

#include <iostream>
#include <string>
#include <memory>

void print_C_style_string(const char* psz)
{
    printf("str = %s", psz);
}

int main()
{
    std::unique_ptr<std::string> p{new std::string{"This is a string!"}};
    print_C_style_string(p.get()->c_str());
    return 0;
}

There is no overhead with unique_ptr. From "The C++ Programming Language" by Bjarne Stroustrup:

unique_ptr is a very lightweight mechanism with no space or time overhead compared to correct use of a built-in pointer.

Simple way of reading file line-by-line into string:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
    ifstream fin("my_file.txt", ios::in);
    if(!fin){
        printf("Error opening file.");
        return 1;
    }

    while(fin.peek() != EOF){
        string s;
        getline(fin, s);
        cout << s << std::endl;
    }

    fin.close();

    return 0;
}
jensa
  • 2,792
  • 2
  • 21
  • 36
  • +1 A `unique_ptr` is just a very light RAII wrapper to make it easier to manage the lifetime of the string/buffer. This is the C++ way. – David Schwartz Mar 26 '15 at 23:29
  • Ok, what bout my first question, the two different ways of reading the file into a `std::string`, or `std::unique_ptr` in this case. – Francisco Aguilera Mar 26 '15 at 23:29
  • Yeah, it's goo to use that rather than "naked" new/delete. – jensa Mar 26 '15 at 23:30
  • 1
    @FranciscoAguilera My question is - why are you wrapping your string in a pointer in the first place? Why not just instantiate a string on the stack, and pass its `c_str()`? – Julian Mar 26 '15 at 23:32
1

When you're writing modern C++, there is a particular set of cases that require the use of pointers, but this isn't one of them. You should always prefer not to use them, when you don't have to. Return value optimization (RVO) and RAII are wonderful things!

By default, an std::string created on the stack uses the heap to store its internal character data; so there's no need to worry limited stack memory in your situation. Additionally, std::string::c_str() will return the string's internal const char* data, which is exactly what gumbo_parse() accepts as an argument.

This simplifies the code required to read your file. RVO will prevent unnecessary copies, increasing performance:

std::string get_file_contents(const std::string& filename)
{
    std::ifstream ifs{filename}; // Open the input file stream
    std::ostringstream ss;       // stringstream to retrieve file data
    ss << ifs.rdbuf();           // Read file data into ss buffer
    return ss.str();             // Return the ss buffer as string (RVO)
}

Now you don't have to worry about explicitly freeing any memory. You can use the above function and pass the resulting string's c_str() to gumbo_parse():

const std::string file_contents{get_file_contents("input.txt")};

GumboOutput* output = gumbo_parse(file_contents.c_str());

// ...

gumbo_destroy_output(&kGumboDefaultOptions, output);

If you really did want to experiment with smart pointers somewhere, you could try wrapping your GumboOutput in one with a custom deleter; something like this:

using namespace std::placeholders;  // for _1, _2, _3 ...
auto cleanup = std::bind(gumbo_destroy_output, &kGumboDefaultOptions, _1);
std::unique_ptr<int, decltype(cleanup)> output{
    gumbo_parse(file_contents.c_str()), cleanup };

The unique_ptr will now automatically call gumbo_destroy_output on its managed GumboOutput when it is destroyed (i.e., when it goes out of scope). Use this at your own discretion.

Julian
  • 1,688
  • 1
  • 12
  • 19