4

TL; DR:

How do I resize (adjust both length and capacity of) a C++ STL vector without any initialization? Garbage values are acceptable!

Question

I am aware that STL Vector has resize() method, but this method involves initialization, which might be unnecessary.

Moreover, I found a set_len() function in Rust which does what I want exactly. Is there a way (even hacky) for C++ STL to achieve this?

The set_len() doc in Rust can be found here.

EDIT 1

  1. I am aware that setting a length that is larger than the vector's capacity is undefined behavior, and I have to be very careful (unsafe fn, sure enough), but I am talking about those cases where the_new_length_i_am_setting <= vec.capacity() is GUARANTEED (I have already reserved() correctly).

  2. I don't really care what values will be filled into those extra spaces (garbage is acceptable) since I will manually overwrite them carefully afterward. The difference between malloc() and calloc() is a perfect analogy of what I am talking about.

  3. My use case: store bytes from multiple read()s or recv()s calls into the same vector directly without using an extra array/buffer. (I have achieved this in Rust's Vec using reserve() and then set_len(), but I failed to find an equivalent function for set_len() in C++ STL's vector.

  4. To make things easier to understand, I am basically trying to use vector on Linux system APIs which only arrays are accepted. Calling realloc() on a malloc()-ed array will definitely do the same, but it is error-prone, especially when dealing with indices.

jerryc05
  • 454
  • 1
  • 4
  • 16
  • 2
    What values **do** you want to fill the extra space with? – Igor Tandetnik Apr 26 '20 at 02:06
  • The Rust `set_len` function causes undefined behaviour if the extra space contains uninitialized values. Moreover, this just sets the length of the `Vec` and doesn't resize the underlying allocation to be of the correct size. What you want to do in Rust is `my_vec.reserve(total_size - my_vec.len())`. I'm not versed in C++ so I don't know what is best for C++, but the page on reserve says nothing about resetting the space. – Optimistic Peach Apr 26 '20 at 02:11
  • 2
    @OptimisticPeach -- How do you know what he wants to do, and that it's not what he said he wants to do? – Benjamin Lindley Apr 26 '20 at 02:24
  • Because in their post they talk about "filling the extra space with default values" which leads me to believe they are talking either of resizing the underlying allocation and using the space, or setting the length for the `Vec` and treating the rest of the memory as already initialized. The question isn't well worded, as though I'm not clear on what they want to achieve in C++ which they do in Rust. However, I know that the common goal between all systems language developers is to avoid undefined behaviour. – Optimistic Peach Apr 26 '20 at 02:30
  • 1
    @jerryc05 Could you please edit your question to describe what your usage of `set_len` was in Rust? If you were trying to resize the allocation beneath the `Vec`, then you should be using `reserve` (or just use `with_capacity` in the first place), if you were trying to set the field which describes the number of valid items without actually making sure they're initialized, then you're invoking undefined behaviour. – Optimistic Peach Apr 26 '20 at 02:34
  • `set_len` on its own is not UB, if the uninitialized values are not accessed (indexing, slicing, dropping, cloning the Vec etc) – ramslök Apr 26 '20 at 08:17
  • 1
    @cigien I really have no idea what's wrong with my link to Rust's `Vec` documentation. Could you be more specific? – jerryc05 Apr 26 '20 at 22:36
  • @OptimisticPeach I have made several clarifications/edits, and I hope that helps a bit. Thank you for your help anyway. – jerryc05 Apr 26 '20 at 22:41
  • You can only do it if you supply a custom allocator, see https://stackoverflow.com/questions/21028299/is-this-behavior-of-vectorresizesize-type-n-under-c11-and-boost-container/21028912 (if you use that allocator, `resize` will always leave element default-initialized, which for integers means uninitialized) Also, if you simply need an uninitialized buffer without resizing, copying, etc, (and if your compiler is new enough), you could use `auto buffer = std::make_unique_for_overwrite(Size);`. – HolyBlackCat Apr 26 '20 at 22:44
  • 1
    @HolyBlackCat Okay thanks, good to know that. However, using Rust's `set_len()` is much more elegant than having to write a custom allocator. Hmmm. :( – jerryc05 Apr 26 '20 at 22:56
  • Well, you don't have too many options here. The linked question also mentions `boost::vector` which allegedly has an extra overload of `resize` that doesn't initialize new elements. – HolyBlackCat Apr 26 '20 at 23:01
  • 1
    Just to be sure we're talking about the same thing, the only way to do this soundly in Rust is `reserve`, *then* initialize (using pointer arithmetic or `MaybeUninit`-slices), *then* `set_len`. If you call `set_len(n)` *before* initializing the first `n` elements you incur undefined behavior just as in C++. – trent Apr 27 '20 at 02:05
  • @ramslök, specifically `set_len` on its own _is_ UB. If you don't `std::mem::forget` the `Vec`, it'll drop the values that are supposed to be initialized. Doing _anything_ with a `Vec` with a broken invariant is UB, since unsuspecting safe code might access the variables, or maybe the `Vec` is cloned, dropped, etc. As trentcl explains, the only way to make sure the remaining elements in the allocation are valid is to initialize them and _then_ treat them as initialized via `set_len`. – Optimistic Peach Apr 27 '20 at 03:53
  • 1
    I'd claim that it is not UB on its own. It might have UB consequences depending on what happens next, like the drop that your are mentioning. There I absolutely agree. How can we conclude that set_len on its own is UB? Ralf Jung has in various discussions suggested that it's not. ([Link](https://github.com/rust-lang/rust-clippy/issues/4483#issuecomment-526950069)) – ramslök Apr 27 '20 at 12:16

1 Answers1

5

Preface (this might be long, but it's very important)

You might say that "garbage values" are acceptable, but they are not. Because when people say they have garbage values in their C++ code they really don't, they actually have Undefined Behavior. And you shouldn't take lightly Undefined Behavior. Allow me to quote from another answer of mine:

void foo();
void bar();

void test(bool cond)
{
    int a; // uninitialized

    if (cond)
        a = 24;

    if (a == 24)
        foo();
    else
        bar();
}

What is the result of calling the above function with true? What about with false?

test(true) will cleary call foo().

What about test(false)? If you answer: "Well it depends on what garbage value is in variable a, if it is 24 it will call foo, else it will call bar" Then you are completely wrong.

If you call test(false) the program accesses an uninitialized variable and has Undefined Behavior, it is an illegal path and so the compilers are free to assume cond is never false (because otherwise the program would be illegal). And surprise surprise both gcc and clang with optimizations enabled actually do this and generate this assembly for the function:

test(bool):
        jmp     foo()

The moral of the story is that UB is UB. Don't rely on any kind of behavior. Accessing uninitialized variables or memory doesn't result in garbage values, it results in UB and the results can be very bad, extremely surprising and difficult to debug.


Going back to your question: no there is no way in std::vector to allocate uninitialized memory you can access. I.e. this is UB and you should 100% not do this:

std::vector<int> v = ...;

v.reserve(v.size() + 100);

v[v.size()] = 11; // Access beyond vector's size,
                  // Undefined Behavior even if you reserved memory for it

I am aware that setting a length that is larger than the vector's capacity is undefined behavior

No, it's not:

std::vector::resize

void resize( size_type count );

Resizes the container to contain count elements.

If the current size is less than count, additional default-inserted elements are appended

Of course, this has the disadvantage that elements will be default-inserted. But as I've said there is no way around this in std:::vector


To do what you want is actually theoretically impossible even without std::vector. There is a long lasting issue with the object lifetime rules in C++ (even though de-facto it is ignored in practice). There is a paper p0593r2 which tries to solve this. But even with a solution adopted into the standard you would still need to implement your own container to allow and do what you desire.

bolov
  • 72,283
  • 15
  • 145
  • 224
  • 1
    Thanks for your info. However, without std::vector, I can `malloc()` and `realloc()` an array without initializing newly-allocated spaces, am I right? – jerryc05 Apr 27 '20 at 05:25
  • 1
    @jerryc05 yes and no. If you read the paper I've mentioned at §2.1. you will see a simple example with `malloc` which while in C is idiomatic in C++ is technically UB because `malloc` (and `calloc`) doesn't create an object as per the C++ rules. This means that technically in `C++` you can't use `malloc` (and `calloc`) to create objects and while we're here `std::vector` cannot be implemented without UB, but `std::vector` got away with it because it's implemented in the standard library and honestly because it has to. In practice though there's a lot of C++ code out there that uses `malloc`. – bolov Apr 27 '20 at 08:48