31

I'm not exactly a C++ newbie, but I have had little serious dealings with it in the past, so my knowledge of its facilities is rather sketchy.

I'm writing a quick proof-of-concept program in C++ and I need a dynamically sizeable buffer of binary data. That is, I'm going to receive data from a network socket and I don't know how much there will be (although not more than a few MB). I could write such a buffer myself, but why bother if the standard library probably has something already? I'm using VS2008, so some Microsoft-specific extension is just fine by me. I only need four operations:

  • Create the buffer
  • Write data to the buffer (binary junk, not zero-terminated)
  • Get the written data as a char array (together with its length)
  • Free the buffer

What is the name of the class/function set/whatever that I need?

Added: Several votes go to std::vector. All nice and fine, but I don't want to push several MB of data byte-by-byte. The socket will give data to me in few-KB large chunks, so I'd like to write them all at once. Also, at the end I will need to get the data as a simple char*, because I will need to pass the whole blob along to some Win32 API functions unmodified.

Jacob
  • 34,255
  • 14
  • 110
  • 165
Vilx-
  • 104,512
  • 87
  • 279
  • 422
  • 3
    You are NOT pushing byte-by-byte. You can insert a block of data at the end of the vector. – Joe Dec 09 '09 at 15:07

10 Answers10

47

You want a std::vector:

std::vector<char> myData;

vector will automatically allocate and deallocate its memory for you. Use push_back to add new data (vector will resize for you if required), and the indexing operator [] to retrieve data.

If at any point you can guess how much memory you'll need, I suggest calling reserve so that subsequent push_back's won't have to reallocate as much.

If you want to read in a chunk of memory and append it to your buffer, easiest would probably be something like:

std::vector<char> myData;
for (;;) {
    const int BufferSize = 1024;
    char rawBuffer[BufferSize];

    const unsigned bytesRead = get_network_data(rawBuffer, sizeof(rawBuffer));
    if (bytesRead <= 0) {
        break;
    }

    myData.insert(myData.end(), rawBuffer, rawBuffer + bytesRead);
}

myData now has all the read data, reading chunk by chunk. However, we're copying twice.

We instead try something like this:

std::vector<char> myData;
for (;;) {
    const int BufferSize = 1024;

    const size_t oldSize = myData.size();
    myData.resize(myData.size() + BufferSize);        

    const unsigned bytesRead = get_network_data(&myData[oldSize], BufferSize);
    myData.resize(oldSize + bytesRead);

    if (bytesRead == 0) {
        break;
    }
}

Which reads directly into the buffer, at the cost of occasionally over-allocating.

This can be made smarter by e.g. doubling the vector size for each resize to amortize resizes, as the first solution does implicitly. And of course, you can reserve() a much larger buffer up front if you have a priori knowledge of the probable size of the final buffer, to minimize resizes.

Both are left as an exercise for the reader. :)

Finally, if you need to treat your data as a raw-array:

some_c_function(myData.data(), myData.size());

std::vector is guaranteed to be contiguous.

GManNickG
  • 494,350
  • 52
  • 494
  • 543
  • OK, but I don't see a member with which I could add a whole buffer of data. Or do I have to push several MB byte-by-byte? I will read from the socket it in nice few-KB large chunks. – Vilx- Dec 09 '09 at 14:48
  • 1
    Vilx -- use myData.insert(myData.end(), bytes_ptr, bytes_ptr + bytes_count) – atzz Dec 09 '09 at 14:53
  • Assuming that you have a buffer of known size, `vec.insert(vec.end, buf, buf+length)` – KeithB Dec 09 '09 at 14:53
  • I don't see an `append()` member function on the vector. – Vilx- Dec 09 '09 at 14:54
  • 3
    Vector is required to be contiguous, so it is possible to take the address of an element and memcopy() a block of data into it. Feel free to shudder at the horror of this. – RobH Dec 09 '09 at 15:01
  • I shudder at the horror of this. – Vilx- Dec 09 '09 at 15:06
  • Taking the address of the first element is fairly common. Also, if you're reading network data and want it, you'll have to copy *somewhere*, which involves every byte. Some CPU's can copy multiple bytes at once, and your compiler will take advantage of that for you. – GManNickG Dec 09 '09 at 15:06
  • 4
    Why use intermediate buffer? Why not read network data directly into the vector? Resize the vector to its old size +N, receive maximum N bytes to &vector[old_vector_size]. – sbk Dec 09 '09 at 15:17
  • By default resize will zero initialise all elements so the second answer of reading directly into the vector is replacing the cost of copying data with zero initialisation. For better performance see https://stackoverflow.com/questions/21028299/is-this-behavior-of-vectorresizesize-type-n-under-c11-and-boost-container/21028912#21028912 to avoid the zero initialisation. – Martin Sherburn May 20 '21 at 13:55
10

std::string would work for this:

  • It supports embedded nulls.
  • You can append multi-byte chunks of data to it by calling append() on it with a pointer and a length.
  • You can get its contents as a char array by calling data() on it, and the current length by calling size() or length() on it.
  • Freeing the buffer is handled automatically by the destructor, but you can also call clear() on it to erase its contents without destroying it.
Wyzard
  • 33,849
  • 3
  • 67
  • 87
9
std::vector<unsigned char> buffer;

Every push_back will add new char at the end (reallocating if needed). You can call reserve to minimize the number of allocations if you roughly know how much data you expect.

buffer.reserve(1000000);

If you have something like this:

unsigned char buffer[1000];
std::vector<unsigned char> vec(buffer, buffer + 1000);
Nikola Smiljanić
  • 26,745
  • 6
  • 48
  • 60
7

One more vote for std::vector. Minimal code, skips the extra copy GMan's code do:

std::vector<char> buffer;
static const size_t MaxBytesPerRecv = 1024;
size_t bytesRead;
do
{
    const size_t oldSize = buffer.size();

    buffer.resize(oldSize + MaxBytesPerRecv);
    bytesRead = receive(&buffer[oldSize], MaxBytesPerRecv); // pseudo, as is the case with winsock recv() functions, they get a buffer and maximum bytes to write to the buffer

    myData.resize(oldSize + bytesRead); // shrink the vector, this is practically no-op - it only modifies the internal size, no data is moved/freed
} while (bytesRead > 0);

As for calling WinAPI functions - use &buffer[0] (yeah, it's a little bit clumsy, but that's the way it is) to pass to the char* arguments, buffer.size() as length.

And a final note, you can use std::string instead of std::vector, there shouldn't be any difference (except you can write buffer.data() instead of &buffer[0] if you buffer is a string)

sbk
  • 9,212
  • 4
  • 32
  • 40
  • 1
    +1: if you choose vector, this is the way to do it. I still claim the vector here just used as a collection of {size, capacity, pointer} and you could just as easily call `realloc` yourself though ... – Useless Dec 09 '09 at 15:52
  • I claim C++ is just really some assembly instructions and you should use those. :P – GManNickG Dec 09 '09 at 15:57
  • Fair enough ;D I just don't think the vector is adding much abstraction or expressiveness here - although this may depend on the user/reader's level of comfort with C memory allocation. – Useless Dec 09 '09 at 16:34
  • 1
    @Useless: Ok then how about hassle free exception safe memory management? – Sandeep Datta Dec 09 '09 at 17:20
  • OK, good point: I'm used to idiomatic C code for low-level socket programming (and the POSIX sockets API doesn't throw), but it isn't either good style in general or idiomatic C++. – Useless Dec 09 '09 at 17:47
  • @sbk `myData.resize(oldSize + bytesRead);` should be `buffer.resize(oldSize + bytesRead);`...a small typo I think. – enthusiasticgeek May 11 '13 at 18:51
4

I'd take a look at Boost basic_streambuf, which is designed for this kind of purpose. If you can't (or don't want to) use Boost, I'd consider std::basic_streambuf, which is quite similar, but a little more work to use. Either way, you basically derive from that base class and overload underflow() to read data from the socket into the buffer. You'll normally attach an std::istream to the buffer, so other code reads from it about the same way as they would user input from the keyboard (or whatever).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
2

An alternative which is not from STL but might be of use - Boost.Circular buffer

1

Use std::vector, a growing array that guarantees the storage is contiguous (your third point).

Xavier Nodet
  • 5,033
  • 2
  • 37
  • 48
0

Regarding your comment "I don't see an append()", ineserting at the end is the same thing.

vec.insert(vec.end,

0

If you do use std::vector, you're just using it to manage the raw memory for you. You could just malloc the biggest buffer you think you'll need, and keep track of the write offset/total bytes read so far (they're the same thing). If you get to the end ... either realloc or choose a way to fail.

I know, it isn't very C++y, but this is a simple problem and the other proposals seem like heavyweight ways to introduce an unnecessary copy.

Useless
  • 64,155
  • 6
  • 88
  • 132
  • 1
    Well, that is basically what I want to do. I just wondered if there wasn't some built-in way for doing that already. – Vilx- Dec 10 '09 at 14:01
0

The point here is, what you want to use the buffer for. If you want to keep structures with pointers the buffer has to be kept fix at the memory address allocated first. To circumvent this, you have to use relative pointers and a fixup list for updating the pointers after the final allocation. This would be worth a class of its own. (Didn't find such a thing).