54

With vectors, one can assume that elements are stored contiguously in memory, allowing the range [&vec[0], &vec[vec.capacity()) to be used as a normal array. E.g.,

vector<char> buf;
buf.reserve(N);
int M = read(fd, &buf[0], N);

But now the vector doesn't know that it contains M bytes of data, added externally by read(). I know that vector::resize() sets the size, but it also clears the data, so it can't be used to update the size after the read() call.

Is there a trivial way to read data directly into vectors and update the size after? Yes, I know of the obvious workarounds like using a small array as a temporary read buffer, and using vector::insert() to append that to the end of the vector:

char tmp[N];
int M = read(fd, tmp, N);
buf.insert(buf.end(), tmp, tmp + M)

This works (and it's what I'm doing today), but it just bothers me that there is an extra copy operation there that would not be required if I could put the data directly into the vector.

So, is there a simple way to modify the vector size when data has been added externally?

Andy Finkenstadt
  • 3,547
  • 1
  • 21
  • 25
user984228
  • 541
  • 1
  • 4
  • 3
  • 2
    Are you sure `&buf[0]` works in debug mode? For instance, on Visual Studio, in debug mode `std::vector::operator[]` performs a range check. So that expression will throw if `buf` is empty. – Praetorian Oct 07 '11 at 15:41
  • I use GCC, and I ran the program through valgrind to make sure that no memory errors occured. All I can say is that with the GNU libstdc++ implementation, this works. &vec[0] seems to give you a direct pointer to reserved memory, no matter the size(). – user984228 Oct 07 '11 at 16:11
  • 1
    @user984228: if you're happy to rely on implementation details of GCC (which is a BAD IDEA (TM)), then you'd look at the source for its implementation of `vector`. You can see where it stores the `begin` and `end` pointers and capacity, and if you just overwrite the `end` pointer, I'm pretty sure that will change the size as you want. Just copy whatever the implementation of `resize()` does in the case where the capacity is big enough to start with, leaving out the memset/fill/whatever. You'll have to work around some `private` modifiers, of course, perhaps by hard-coding in the offsets. – Steve Jessop Oct 07 '11 at 16:24
  • 8
    @SteveJessop: I just died a little. – Matthieu M. Oct 07 '11 at 16:34
  • @Matthieu: quite. If all that stuff sounds like a bad idea, then hopefully relying on the fact that GCC appears to let you write into space that's only reserved, not resized, also sounds like a bad idea :-) – Steve Jessop Oct 07 '11 at 17:49
  • @SteveJessop: of course it does, but then I have been tainted by interoperability at a young (programmer) age since I began with Windows/Linux programs (and they are QUITE different environments :p) – Matthieu M. Oct 07 '11 at 18:16
  • @Praetorian The [C++ standard](https://en.cppreference.com/w/cpp/container/vector/operator_at) says _"No bounds checking is performed."_. So Visual Studio is obviously not obeying the standard. – darkdragon Jul 05 '22 at 11:15

5 Answers5

28
vector<char> buf;
buf.reserve(N);
int M = read(fd, &buf[0], N);

This code fragment invokes undefined behavior. You can't write beyond than size() elements, even if you have reserved the space.

The correct code is like:

vector<char> buf;
buf.resize(N);
int M = read(fd, &buf[0], N);
buf.resize(M);


PS. Your statement "With vectors, one can assume that elements are stored contiguously in memory, allowing the range [&vec[0], &vec[vec.capacity()) to be used as a normal array" isn't true. The allowable range is [&vec[0], &vec[vec.size()).
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • 18
    There is no way to avoid the unnecessary initialization that the first resize() causes? – user984228 Oct 07 '11 at 15:35
  • +1 Of course this has the problem of unneccessarily initializing the chars before overwriting them, but in C++ it won't get any better. – Christian Rau Oct 07 '11 at 15:36
  • 3
    99% certainty the extra initialization will be dwarfed by the cost of your I/O anyway. – Mark B Oct 07 '11 at 15:39
  • Then I'd rather just use the temporary buffer + insert. It should be at least as efficient, and I don't have to worry about reallocation too often (or does resize() take the same steps to minimize the number of reallocations?) – user984228 Oct 07 '11 at 15:40
  • 2
    @user984228: The question is whether that is a problem. If you have measured and that initialization becomes a bottleneck (I would not expect that) then you might need to consider implementing your own data structure... Note: *if and only if*, I am not trying to have you implement your own data type, but rather realize that in most cases that will not be a performance bottleneck --i.e. wherever you are reading from is probably much slower than the cost of that initialization. – David Rodríguez - dribeas Oct 07 '11 at 15:40
  • 1
    @user984228: As of reallocations, if you `resize` to a given size, that will trigger a single allocation to a capacity at least as big as the size you requested. – David Rodríguez - dribeas Oct 07 '11 at 15:42
  • Btw, I used file I/O as an example, but the question was general. The external data could come from anywhere, possibly not much more expensive to generate than a simple memset(buf, 0, N). – user984228 Oct 07 '11 at 15:42
  • @user984228 If it's already in a buffer that you could memcpy from then you could *also* use the vector `reserve` and `insert` combination. – Mark B Oct 07 '11 at 15:44
  • @DavidRodríguez-dribeas: It seems that insert() grows the vector in an intelligent way (not *just* enough to hold the inserted data), so it seems much simpler to keep doing what I have been doing. Thanks for all the answers, at least I know that there is no better way to do this with vectors. – user984228 Oct 07 '11 at 15:46
  • 1
    @user984228: I don't know what you meant by *intelligent, not just enough to hold the inserted data*. My feeling is that you have a preconception that might or not be correct and are making conclusions based on that. What is the *intelligent* bit that `insert` does and will help you here? – David Rodríguez - dribeas Oct 07 '11 at 15:52
  • 1
    @user984228 If you really fear the initialization (which could be a problem with more complex objects, but surely not with chars), you can also use `std::get_temporary_buffer` instead of a `std::vector`, perhaps together with `std::uninitialized_copy` or a more general `std::raw_storage_iterator`, but then you have the disadvantage of working with plain arrays, instead of a `std::vector`. – Christian Rau Oct 07 '11 at 15:54
  • 2
    @MarkB: would not you expect a good implementation of insert (range) to be specialized with a single reserve call for random iterators ? – Matthieu M. Oct 07 '11 at 16:14
  • 5
    @user984228: `Then I'd rather just use the temporary buffer + insert. It should be at least as efficient,` Incorrect. The temporary buffer avoids the zero initialization, reads into buffer, then requires a copy from buffer to vector. Vector resize has a zero initialization, then reads into vectors. Zero initialization is _at least_ as fast, probably faster, than a copy. Ergo, resize is still faster than a buffer. – Mooing Duck Oct 07 '11 at 16:17
  • @DavidRodríguez-dribeas: What I mean is that, if make a program that reads a file into a vector in 4 kB blocks, it is *really* fast if you use insert(). I assume that every time it needs to reallocate, it doubles the memory (or something like that), so that the next allocation won't happen immediately after. When I put a vec.reserve(4096) before every insert, it runs insanely slow (I'm talking milliseconds vs minutes). With insert(), I can just append the data I need without worrying about reallocation, because it doesn't happen on every insert (unlike reserve). – user984228 Oct 07 '11 at 16:17
  • I meant to say: "vec.reserve(vec.capacity() + 4096)", not just "vec.reserve(4096)" (but I'm sure everyone knew that ;)). – user984228 Oct 07 '11 at 16:22
  • @user984228: of course it does, the correct code is `vec.reserve(vec.size()+4096)`. If you use `size` you benefit from exponentional growth. With `capacity` you systematically grow the buffer, and end up with a much higher capacity that you will ever use. – Matthieu M. Oct 07 '11 at 16:28
  • @MatthieuM.: No, on my implementation, reserve() gives you exactly what you asked for. If you do vector::reserve(N) where N > capacity() it reallocates to N. – user984228 Oct 07 '11 at 16:45
  • @Matthieu M. It would be somewhat expected but the standard doesn't require it and g++ 4.2 anyway doesn't (directly) differentiate between forward and random access iterators in its input range. What it does however is use `std::distance` to preallocate for *any* iterator type. – Mark B Oct 07 '11 at 16:59
  • @user984228 of course using a linearly increasing size for your reserve is going to cause significant performance problems as it continually has to copy your data around every time you increase it. If you use `resize` though it will use its normal size-scaling algorithm rather than linearly increasing it. – Mark B Oct 07 '11 at 17:05
  • @user984228: crap, that's pretty short-sighted... well, then just implement exponential growth yourself :x – Matthieu M. Oct 07 '11 at 17:33
  • @Matthieu M. But if he properly used `resize` instead of `reserve` he'd get the scaling growth factor automatically. – Mark B Oct 07 '11 at 17:35
12

Another, newer, question, a duplicate of this one, has an answer, which looks like exactly what is asked here. Here's its copy (of v3) for quick reference:

It is a known issue that initialization can not be turned off even explicitly for std::vector.

People normally implement their own pod_vector<> that does not do any initialization of the elements.

Another way is to create a type which is layout-compatible with char, whose constructor does nothing:

struct NoInitChar
{
    char value;
    NoInitChar() {
        // do nothing
        static_assert(sizeof *this == sizeof value, "invalid size");
        static_assert(__alignof *this == __alignof value, "invalid alignment");
    }
};

int main() {
    std::vector<NoInitChar> v;
    v.resize(10); // calls NoInitChar() which does not initialize

    // Look ma, no reinterpret_cast<>!
    char* beg = &v.front().value;
    char* end = beg + v.size();
}
Community
  • 1
  • 1
Ruslan
  • 18,162
  • 8
  • 67
  • 136
9

It looks like you can do what you want in C++11 (though I haven't tried this myself). You'll have to define a custom allocator for the vector, then use emplace_back().

First, define

struct do_not_initialize_tag {};

Then define your allocator with this member function:

class my_allocator {
    void construct(char* c, do_not_initialize_tag) const {
        // do nothing
    }

    // details omitted
    // ...
}

Now you can add elements to your array without initializing them:

std::vector<char, my_allocator> buf;
buf.reserve(N);
for (int i = 0; i != N; ++i)
    buf.emplace_back(do_not_initialize_tag());
int M = read(fd, buf.data(), N);
buf.resize(M);

The efficiency of this depends on the compiler's optimizer. For instance, the loop may increment the size member variable N times.

Derek Ledbetter
  • 4,675
  • 3
  • 20
  • 18
  • You cannot 'emplace_back' anything other than 'char's to your 'std::vector' of chars – Gils May 05 '23 at 16:21
2

Writing into and after the size()th element is an undefined behavior.

Next example copies whole file into a vector in a c++ way (no need to know the file's size and no need to preallocate the memory in the vector):

#include <algorithm>
#include <fstream>
#include <iterator>
#include <vector>

int main()
{
    typedef std::istream_iterator<char> istream_iterator;
    std::ifstream file("example.txt");
    std::vector<char> input;

    file >> std::noskipws;
    std::copy( istream_iterator(file), 
               istream_iterator(),
               std::back_inserter(input));
}
BЈовић
  • 62,405
  • 41
  • 173
  • 273
1

Your program fragment has entered the realm of undefined behavior.

when buf.empty() is true, buf[0] has undefined behavior, and therefore &buf[0] is also undefined.

This fragment probably does what you want.

vector<char> buf;
buf.resize(N); // preallocate space
int M = read(fd, &buf[0], N);
buf.resize(M); // disallow access to the remainder
Andy Finkenstadt
  • 3,547
  • 1
  • 21
  • 25