44

I was looking for a way to stuff some data into a string across a DLL boundary. Because we use different compilers, all our dll interfaces are simple char*.

Is there a correct way to pass a pointer into the dll function such that it is able to fill the string buffer directly?

string stringToFillIn(100, '\0');
FunctionInDLL( stringToFillIn.c_str(), stringToFillIn.size() );   // definitely WRONG!
FunctionInDLL( const_cast<char*>(stringToFillIn.data()), stringToFillIn.size() );    // WRONG?
FunctionInDLL( &stringToFillIn[0], stringToFillIn.size() );       // WRONG?
stringToFillIn.resize( strlen( stringToFillIn.c_str() ) );

The one that looks most promising is &stringToFillIn[0] but is that a correct way to do this, given that you'd think that string::data() == &string[0]? It seems inconsistent.

Or is it better to swallow an extra allocation and avoid the question:

vector<char> vectorToFillIn(100);
FunctionInDLL( &vectorToFillIn[0], vectorToFillIn.size() );
string dllGaveUs( &vectorToFillIn[0] );
markh44
  • 5,804
  • 5
  • 28
  • 33

9 Answers9

26

I'm not sure the standard guarantees that the data in a std::string is stored as a char*. The most portable way I can think of is to use a std::vector, which is guaranteed to store its data in a continuous chunk of memory:

std::vector<char> buffer(100);
FunctionInDLL(&buffer[0], buffer.size());
std::string stringToFillIn(&buffer[0]);

This will of course require the data to be copied twice, which is a bit inefficient.

CAdaker
  • 14,385
  • 3
  • 30
  • 32
  • 7
    In terms of efficiency, if you start using std::vector as a buffer, you're going to run into a different kind of performance problem where each element of the vector is initialized one by one. If you reserve a 32K buffer (which isn't all that much), you will spend a considerable amount of CPU time initializing this buffer. If you just need a continuous chunk of memory, you are far better of simply using an array new char[] combine that with std::unique_ptr or some other RAII pattern and you're good to go but don't use std::vector unless you absolutely need to initialize each element. – John Leidegren Mar 07 '13 at 12:55
  • 2
    Use the vector trick in http://stackoverflow.com/questions/11149665/c-vector-that-doesnt-initialize-its-members. – Marc Eaddy Jun 06 '13 at 16:26
  • 2
    "I'm not sure the standard guarantees that the data in a `std::string` is stored as a `char*`." It is guaranteed. `std::string` uses `char`. http://en.cppreference.com/w/cpp/string/basic_string – cambunctious Jul 13 '16 at 20:09
21

Update (2021): C++11 cleared this up and the concerns expressed here are no longer relevant.

After a lot more reading and digging around I've discovered that string::c_str and string::data could legitimately return a pointer to a buffer that has nothing to do with how the string itself is stored. It's possible that the string is stored in segments for example. Writing to these buffers has an undefined effect on the contents of the string.

Additionally, string::operator[] should not be used to get a pointer to a sequence of characters - it should only be used for single characters. This is because pointer/array equivalence does not hold with string.

What is very dangerous about this is that it can work on some implementations but then suddenly break for no apparent reason at some future date.

Therefore the only safe way to do this, as others have said, is to avoid any attempt to directly write into the string buffer and use a vector, pass a pointer to the first element and then assign the string from the vector on return from the dll function.

Qix - MONICA WAS MISTREATED
  • 14,451
  • 16
  • 82
  • 145
markh44
  • 5,804
  • 5
  • 28
  • 33
  • 38
    C++0x is changing strings to use contiguous memory – Patrick Jun 25 '09 at 11:11
  • 2
    What Patrick says. Also, Herb Sutter in 2008, while in the midst of discussing this with the C++0x working group, did not know of an implementation that wasn't contiguous: http://herbsutter.wordpress.com/2008/04/07/cringe-not-vectors-are-guaranteed-to-be-contiguous/ (and scroll down to the comments). – Steve Jessop Jun 25 '09 at 11:21
  • Interesting that he says &str[0] should give contiguous data, Stroustrup appears to say otherwise in "The C++ Programming Language Special Edition" 20.3.3 p585. Maybe I misunderstand it. Roll on C++0x to clean up the mess! We think we're relying on the not-technically-guaranteed behaviour in various parts of the code so we're going to add some tests that will break if our assumptions about string don't hold anymore. sigh. – markh44 Jun 25 '09 at 11:36
  • That point in 20.3.3 (in my 3rd edition) is about something slightly different - the array/pointer equivalence doesn't apply to vectors either. I don't think Sutter is contradicting Stroustrup, and in any case when writing a reference work you want to avoid where possible statements of the form "at the moment, all implementations that I've looked at do X" in favour of "the standard guarantees Y, don't assume X". It's a bit different when you're discussing future versions of the standard and/or writing actual code that only has to work on implementations Sutter knows about... – Steve Jessop Jun 25 '09 at 12:57
  • 1
    I agree with your "test it in real life" approach, though. If at the point of use you assert &s[0] + (s.size()-1) == &s[s.size()-1] then you should be good, although I suppose that it's possible through epic bad luck for that to be true even though the string is stored in multiple separate allocations. The guarantee given for vector is &v[n] == &v[0] + n for all n from 0 to length()-1. – Steve Jessop Jun 25 '09 at 13:03
12

In C++98 you should not alter the buffers returned by string::c_str() and string::data(). Also, as explained in the other answers, you should not use the string::operator[] to get a pointer to a sequence of characters - it should only be used for single characters.

Starting with C++11 the strings use contiguous memory, so you could use &string[0] to access the internal buffer.

Andrei Bozantan
  • 3,781
  • 2
  • 30
  • 40
8

As long as C++11 gives contiguous memory guaranties, in production practice this 'hacky' method is very popular:

std::string stringToFillIn(100, 0);
FunctionInDLL(stringToFillIn.data(), stringToFillIn.size());
Orion Edwards
  • 121,657
  • 64
  • 239
  • 328
Brian Cannard
  • 852
  • 9
  • 20
  • 1
    Thanks Orion Edwards for editing. Note that it now applicable only for C++17 Standard. See http://en.cppreference.com/w/cpp/string/basic_string/data for details. – Brian Cannard Jul 26 '17 at 20:19
  • 3
    Just to expand on Brian's comment, C++17 added a non-const `data()` override that specifically allows this behavior. – Marcus10110 Apr 22 '19 at 19:17
  • `stringToFillIn` will still have a length of 100 after the function call – Thomas Weller Nov 28 '22 at 15:17
3

I'd not construct a std::string and ship a pointer to the internal buffers across dll boundaries. Instead I would use either a simple char buffer (statically or dynamically allocated). After the call to the dll returns, I'd let a std::string take over the result. It just feels intuitively wrong to let callees write in an internal class buffer.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
rtn
  • 127,556
  • 20
  • 111
  • 121
3

Considering Patrick's comment I would say, it's OK and convenient/efficient to directly write into a std::string. I would use &s.front() to get a char *, like in this mex example:

#include "mex.h"
#include <string>
void mexFunction(
    int nlhs,
    mxArray *plhs[],
    int nrhs,
    const mxArray *prhs[]
)
{
    std::string ret;
    int len = (int)mxGetN(prhs[0]);
    ret.reserve(len+1);
    mxGetString(prhs[0],&ret.front(),len+1);
    mexPrintf(ret.c_str());
}
Roland Puntaier
  • 3,250
  • 30
  • 35
2

You can use char buffer allocated in unique_ptr instead vector:

// allocate buffer
auto buf = std::make_unique<char[]>(len);
// read data
FunctionInDLL(buf.get(), len);
// initialize string
std::string res { buf.get() };

You cannot write directly into string buffer using mentioned ways such as &str[0] and str.data():

#include <iostream>
#include <string>
#include <sstream>

int main()
{
    std::string str;
    std::stringstream ss;
    ss << "test string";
    ss.read(&str[0], 4);       // doesn't work
    ss.read(str.data(), 4);    // doesn't work
    std::cout << str << '\n';
}

Live example.

isnullxbh
  • 807
  • 13
  • 20
0

The standard part of std::string is the API and the some of the behavior, not the memory layout of the implementation.

Therefore if you're using different compilers you can't assume they are the same, so you'll need to transport the actual data. As others have said transport the chars and push into a new std::string.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Simeon Pilgrim
  • 22,906
  • 3
  • 32
  • 45
0

You all have already addressed the contiguity issue (i.e. it's not guaranteed to be contiguous) so I'll just mention the allocation/deallocation point. I've had issues in the past where i've allocated memory in dlls (i.e. had dll return a string) that have caused errors upon destruction (outside the dll). To fix this you must ensure that your allocator and memory pool is consistent across the dll boundary. It'll save you some debugging time ;)

Faisal Vali
  • 32,723
  • 8
  • 42
  • 45