50

I'm making a C++ wrapper for a piece of C code that returns a large array, and so I've tried to return the data in a vector<unsigned char>.

Now the problem is, the data is on the order of megabytes, and vector unnecessarily initializes its storage, which essentially turns out to cut down my speed by half.

How do I prevent this?

Or, if it's not possible -- is there some other STL container that would avoid such needless work? Or must I end up making my own container?

(Pre-C++11)

Note:

I'm passing the vector as my output buffer. I'm not copying the data from elsewhere.
It's something like:

vector<unsigned char> buf(size);   // Why initialize??
GetMyDataFromC(&buf[0], buf.size());
Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886
  • Check out the different constructors and the `assign` function. Just pass your data in. – chris Jun 22 '12 at 03:06
  • 3
    @chris: I can't -- I'm passing `&my_vector[0]` to the C code as my output buffer! – user541686 Jun 22 '12 at 03:07
  • This would be tricky to design - it would be fundamentally unsafe if you stored non-trivially-copyable types like `std::string`. I'm curious what the appropriate answer is. – templatetypedef Jun 22 '12 at 03:07
  • 2
    @templatetypedef: Yeah, me too. :-) I just hope it's not "don't use C++"... – user541686 Jun 22 '12 at 03:10
  • 4
    I know you said pre-C++11, but in C++11 a `std::unique_ptr` would fit the bill here nicely, so maybe `boost::scoped_array` could suffice for now? – ildjarn Jun 22 '12 at 03:16
  • @ildjarn: Allocated by `malloc` you mean, right? – user541686 Jun 22 '12 at 03:20
  • Can you give pseudo code, demonstrating how exactly you are dealing with this for now? – iammilind Jun 22 '12 at 03:20
  • 5
    @Mehrdad : No, allocated without value-initialization (i.e. `new unsigned char[N]` rather than `new unsigned char[N]()`) -- you do realize `new[]` doesn't need to initialize anything for scalars, right?. – ildjarn Jun 22 '12 at 03:22
  • @iammilind: Done; hope that's clearer. – user541686 Jun 22 '12 at 03:24
  • @ildjarn: Oh really?! No I didn't know that... (I guess that shows I don't use `new[]` very often!) That's great to know, thanks! – user541686 Jun 22 '12 at 03:26
  • 3
    @Mehrdad : [Here's](http://ideone.com/UXUjK) an online demo that has exactly the semantics you desire (and again, `boost::scoped_array<>` for C++03). [Here's](http://stackoverflow.com/a/5602057/636019) a link to an answer with the standardese describing default-initialization (no `()`) vs. value-initialization (with `()`). – ildjarn Jun 22 '12 at 03:28
  • How portable do you care to be? Google protocol buffers claims to have a similar hack, and they just dig into STL internals: http://code.google.com/p/protobuf/source/browse/trunk/src/google/protobuf/stubs/stl_util-inl.h#67 – phs Jun 22 '12 at 03:29
  • @phs: To be honest I was actually considering hacking into STL :-) portability isn't important right now, I'm just on Windows, with Visual C++... – user541686 Jun 22 '12 at 03:30
  • @ildjarn: Yup I understand it, thanks! – user541686 Jun 22 '12 at 03:31
  • Honestly, just use an array, or an array wrapped in a C++11 smart pointer if you can. Is using an array *really* that bad? It comes down to how your code is structured. If it can be allocated and cleaned up safely then stop worrying about it and move on to real problems. – Ed S. Jun 22 '12 at 04:02
  • I tried some of the handmade containers, I feel that `new[]` is the most viable option for you instead of `vector`. – iammilind Jun 22 '12 at 04:09
  • @templatetypedef "_This would be tricky to design_" It took me split second to design: `new (&*pos) T` – curiousguy Aug 11 '12 at 01:43
  • @curiousguy- The main complexities would be how you would design the object so that if you tried to read or write a value, it could distinguish between the cases where you were (a) constructing it for the first time, or (b) reassigning the value. You would also have to track what objects were created so that the destructor for the object could destroy all objects constructed in-place. The actual construction of objects in-place would not be that hard. – templatetypedef Aug 11 '12 at 03:25
  • 1
    This seems a **duplicate** of [this](http://stackoverflow.com/questions/15952412/performance-degradation-due-to-default-initialisation-of-elements-in-standard-co) question – Walter Aug 22 '13 at 11:45

5 Answers5

57

For default and value initialization of structs with user-provided default constructors which don't explicitly initialize anything, no initialization is performed on unsigned char members:

struct uninitialized_char {
    unsigned char m;
    uninitialized_char() {}
};

// just to be safe
static_assert(1 == sizeof(uninitialized_char), "");

std::vector<uninitialized_char> v(4 * (1<<20));

GetMyDataFromC(reinterpret_cast<unsigned char*>(&v[0]), v.size());

I think this is even legal under the strict aliasing rules.

When I compared the construction time for v vs. a vector<unsigned char> I got ~8 µs vs ~12 ms. More than 1000x faster. Compiler was clang 3.2 with libc++ and flags: -std=c++11 -Os -fcatch-undefined-behavior -ftrapv -pedantic -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-missing-prototypes

C++11 has a helper for uninitialized storage, std::aligned_storage. Though it requires a compile time size.


Here's an added example, to compare total usage (times in nanoseconds):

VERSION=1 (vector<unsigned char>):

clang++ -std=c++14 -stdlib=libc++ main.cpp -DVERSION=1 -ftrapv -Weverything -Wno-c++98-compat -Wno-sign-conversion -Wno-sign-compare -Os && ./a.out

initialization+first use: 16,425,554
array initialization: 12,228,039
first use: 4,197,515
second use: 4,404,043

VERSION=2 (vector<uninitialized_char>):

clang++ -std=c++14 -stdlib=libc++ main.cpp -DVERSION=2 -ftrapv -Weverything -Wno-c++98-compat -Wno-sign-conversion -Wno-sign-compare -Os && ./a.out

initialization+first use: 7,523,216
array initialization: 12,782
first use: 7,510,434
second use: 4,155,241


#include <iostream>
#include <chrono>
#include <vector>

struct uninitialized_char {
  unsigned char c;
  uninitialized_char() {}
};

void foo(unsigned char *c, int size) {
  for (int i = 0; i < size; ++i) {
    c[i] = '\0';
  }
}

int main() {
  auto start = std::chrono::steady_clock::now();

#if VERSION==1
  using element_type = unsigned char;
#elif VERSION==2
  using element_type = uninitialized_char;
#endif

  std::vector<element_type> v(4 * (1<<20));

  auto end = std::chrono::steady_clock::now();

  foo(reinterpret_cast<unsigned char*>(v.data()), v.size());

  auto end2 = std::chrono::steady_clock::now();

  foo(reinterpret_cast<unsigned char*>(v.data()), v.size());

  auto end3 = std::chrono::steady_clock::now();

  std::cout.imbue(std::locale(""));
  std::cout << "initialization+first use: " << std::chrono::nanoseconds(end2-start).count() << '\n';
  std::cout << "array initialization: " << std::chrono::nanoseconds(end-start).count() << '\n';
  std::cout << "first use: " << std::chrono::nanoseconds(end2-end).count() << '\n';
  std::cout << "second use: " << std::chrono::nanoseconds(end3-end2).count() << '\n';
}

I'm using clang svn-3.6.0 r218006

bames53
  • 86,085
  • 15
  • 179
  • 244
  • 3
    +1 because this is awesome. The only trouble is that it would require some hacking around with `reinterpret_cast` to get the interface to be standard (and to hope for the best :P), but I think it's otherwise pretty great! :D – user541686 Jun 22 '12 at 04:15
  • @Mehrdad, I had tried this wrapper stuff, but in my ubuntu/g++ platform it turns out to be worse; I ran [this code](http://ideone.com/or8Za) and it gives 6 seconds with wrapper way and 1 second with `unsigned char` way. Optimization was `-O4`. I strongly recommend to use `new[]` or `malloc` instead of `vector` if the performance is such an issue. – iammilind Jun 22 '12 at 04:21
  • @iammilind I compared those and got 21 seconds vs. 0 seconds in favor of UninitializedChar (using the same flags I show in my post). So there may be some platform/library implementation differences. – bames53 Jun 22 '12 at 04:50
  • I had also thought of same solution, so +1 from me as well, even though it doesn't work so nicely in my platform. However have you tried with `-O4` in both the cases. I think that would change the dynamics. – iammilind Jun 22 '12 at 04:57
  • @iammilind With -O4 your test case drops to 15 seconds vs. 0, and my test case drops to ~9 ms vs ~50 ns. If I add touching the memory to my test case (assigning a value something to v[0]) then the -Os results are ~12 ms vs. ~15 µs, and -O4 reduces it to ~9 ms vs. still ~15 µs (though there's a lot of variability around that ~15 µs). – bames53 Jun 22 '12 at 05:07
  • I think the second assertion holds by definition; the address of a struct is the same as the address of its first member. – musiphil Jun 22 '12 at 05:08
  • @musiphil I think you're right. I'll just remove that one. And as long as the first one holds I don't see how the last one could possibly be false. (so I'll remove that as well `assert(&(v[0].m) + 1 == &(v[1].m));`) – bames53 Jun 22 '12 at 05:09
  • The third assertion is implied by the first, since `&(v[1].m) - &(v[0].m) == (&(v[1]) - &(v[0])) * sizoef(v[0]) == 1 * sizeof(uninitialized_char) == 1`. – musiphil Jun 22 '12 at 05:11
  • @bames53: to avoid the `reinterpret_cast`, you could simply pass `&(v[0].m)`, or is the cast intentional ? – Matthieu M. Jun 22 '12 at 06:28
  • 1
    @MatthieuM. Should be practically equivalent. Though I wonder if that would be technically undefined behavior (since m works as a single element array and we'll be accessing outside that range) whiled reinterpret cast is technically implementation defined? – bames53 Jun 22 '12 at 06:38
  • 1
    @bames53: Sometimes the standard is too weird for me ;) – Matthieu M. Jun 22 '12 at 07:11
  • 1
    @Mehrdad: You should be able to get rid of the `reinterpret_cast` by passing `&v[0].m`. – Michael Burr Jun 22 '12 at 15:32
  • @bames53: Just a note, I take "undefined" and "implementation defined" to mean the same thing, since both of them are saying, "not defined by this standard". So if all you're referring to is that standard, then it's undefined. – user541686 Jun 22 '12 at 16:20
  • 4
    @Mehrdad Undefined behavior means more than that though. UB means the standard places no requirements on any part of the program, before or after the UB. Programs can't avoid being in part implementation defined, but that doesn't release them from being constrained by the standard the way UB does. – bames53 Jun 22 '12 at 17:19
  • @bames53: I completely agree with your definition of UB. But for me, that's synonymous with "implementation defined" -- either way, if the implementation's documentation didn't tell you otherwise, then it's undefined. :-) – user541686 Jun 22 '12 at 17:51
  • 4
    @Mehrdad: I hope you understand the issue correctly, but it's quite misleading how you say it: dereferencing an invalid pointer is undefined, and whether a `char` is signed or unsigned is implementation-defined; how can you say those two are _synonymous_? – musiphil Jun 22 '12 at 23:28
  • @musiphil: The implementation may define that dereferencing an invalid pointer may, for example, bring out your nasal demons. Then this becomes implementation-defined. Also, the standard has *not defined* whether `char` is signed or not (hence, it's undefined), but when the implementation does, it becomes implementation-defined. They're more or less the same thing -- what you call it depends on whether the implementation happened to define it, *not* whether the standard calls it "undefined" or "implementation-defined". So *for the purposes of interpreting the C++ standard, they're synonymous.* – user541686 Jun 22 '12 at 23:52
  • Of course, the standard may *require* that the implementation define something. Then it's never undefined for any particular implementation (obviously...), but it's still undefined [by the standard]. So in that case, when you're not talking about any particular implementation, they're synonymous. Or at least that's how I see it... – user541686 Jun 22 '12 at 23:59
  • 2
    @Mehrdad You seem to be using the term 'undefined behavior' simply in its colloquial sense. The C++ standard assigns special significance to these terms and does not use them interchangeably. E.g. Even if an implementation defines the effects of dereferencing an invalid pointer, the result is still _undefined-behavior_ as that term is defined in the _C++ standard [defns.undefined] 1.3.24_. – bames53 Jun 23 '12 at 00:18
  • @bames53: Ah. Yes, I definitely am. – user541686 Jun 23 '12 at 00:23
  • 3
    "Implementation-defined" refers to a particular construct; "Undefined" refers to _the entire program_. They are not even close to synonymous, even when you do not specify an implementation. (This is a massively under-appreciated point about UB, even by people who think they know about UB.) – Nemo Jul 23 '12 at 05:15
  • `std::vector::data()` is a C++11 feature, but the question asked for a pre-C++11 solution, so `v.data()` should be written as `&v[0]`. – Adrian McCarthy Sep 18 '14 at 23:01
  • "~8 µs vs ~12 ms." But do both examples go on to populate the vector in GetMyDataFromC? – Neil Kirk Sep 18 '14 at 23:37
  • @NeilKirk Those times are only the time it takes for the initialization. – bames53 Sep 19 '14 at 00:14
  • It is not a fair test, because OS may not actually allocate the memory until it is touched, which won't happen until the function is reached. If the total times are 11ms vs 12ms for example, that's quite a different result. It may even be negligible. – Neil Kirk Sep 19 '14 at 10:01
  • @NeilKirk Including an assignment does not significantly change the time. On my current system the vector initialization with `uninitialized_char` takes ~12µs with or without an assignment. Furthermore, the goal is to match the performance of `malloc`ing a buffer, which will have exactly the same behavior of avoiding allocating real memory pages. – bames53 Sep 19 '14 at 14:00
  • 1
    @NeilKirk For example, if I time both the vector initialization and the use of the memory then using `vector` gives times of 12ms for initialization and 4ms for a usage, while `uninitialized_char` gives the 12µs for initialization and 7.5ms for the first usage. That means the total with `vector` is 16ms and the total with uninitialized_char is 7.5ms, or less than half. That matches the OPs comment that using `vector` vs. `malloc` cut his speed in half. – bames53 Sep 19 '14 at 14:01
9

Sorry, there's no way to avoid it.

C++11 adds a constructor that takes only a size, but even that will value-initialize the data.

Your best bet is to just allocate an array on the heap, stick it in a unique_ptr (where available), and use it from there.

If you're willing to, as you say, "hacking into STL," you could always grab a copy of EASTL to work from. It's a variation of certain STL containers that allows for more restricted memory conditions. A proper implementation of what you're trying to do would be to give its constructor a special value that means "default initialize the members," which for POD types means to do nothing to initialize the memory. This requires using some template metaprogramming to detect if it is a POD type, of course.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Ugh, I was afraid of that. +1. Btw, the problem isn't the lack of a constructor with a pointer -- even then, I'm passing the vector *as* my buffer, so that it's filled in directly. I'm not copying it from somewhere else. – user541686 Jun 22 '12 at 03:13
  • @Mehrdad, Guess this is where those move semantics work quite nicely, had you actually been able to use them. – chris Jun 22 '12 at 03:14
  • 1
    @chris: Yes, but move from... what? – user541686 Jun 22 '12 at 03:15
  • @Mehrdad: I didn't say anything about a constructor with a pointer. My point in mentioning that constructor is that C++03 had a size constructor that copy-initialized each value from its second argument (taking longer than default-init). C++11 provides default init, but nothing provides no initialization. – Nicol Bolas Jun 22 '12 at 03:15
  • @Mehrdad, You could move a normal array over. – chris Jun 22 '12 at 03:15
  • 1
    @chris: I wouldn't know the array's size at compile-time though. Or perhaps I don't understand what you mean? – user541686 Jun 22 '12 at 03:16
  • 2
    @chris: You can only move things that are compatible with each other. In general, you can only move one class instance into another instance of the same class, as moving requires poking at the guts of the object. And you certainly can't just move a `Type[]` into a `std::vector`. – Nicol Bolas Jun 22 '12 at 03:18
  • @NicolBolas, Wow, I'm definitely not thinking straight. For some reason I was thinking there was a move form of the begin/end constructor. Time for bed... – chris Jun 22 '12 at 03:20
  • 1
    @chris : One could use `std::move_iterator<>` and call the begin/end constructor, but that would still only do element-wise moving, which wouldn't be useful for `unsigned char`. – ildjarn Jun 22 '12 at 03:24
  • @Mehrdad: Added some advice about how to go about implementing it. – Nicol Bolas Jun 22 '12 at 03:49
  • 3
    @Mehrdad: If your code is simple enough and the array's lifetime is short just dynamically allocate one and move on. Arrays are not inherently unsafe, it's just that it gets hard to properly manage deallocation of dynamically allocated objects in large projects and in the face of possible exceptions. – Ed S. Jun 22 '12 at 04:11
  • @NicolBolas: Haha thanks, yeah I don't think I should have too trouble with that. :) – user541686 Jun 22 '12 at 04:12
  • 1
    @EdS.: Yeah I think I'm OK at getting them right... it's just that the container would be nonstandard (and people would keep on telling me to avoid arrays >_<) but I don't have any fundamental objections to it I guess. – user541686 Jun 22 '12 at 04:13
  • @Mehrdad: If you *really* want a quick and dirty container (and don't have access to something already) you can whip on up easily. You get the gist; allocate in the constructor, deallocate in the destructor. You may as well make the copy constructor and assigment operator private so you don't have to worry about multiple containers pointing to the same array. – Ed S. Jun 22 '12 at 07:26
  • Your answer 'Sorry, there's no way to avoid it.' is wrong. There is a way to avoid: to use a [default-initializing allocator](https://stackoverflow.com/a/21028912/427158). – maxschlepzig Nov 03 '18 at 14:38
4

The optimal solution is to simply change the allocator to do nothing for a zero-arguments construct. This means that the underlying type is the same, which dodges any kind of nasty reinterpret_casting and potential aliasing violations and can non-intrusively uninitialize any type.

template<typename T> struct no_initialize : std::allocator<T> {
    void construct(T* p) {}
    template<typename... Args> void construct(T* p, Args&&... args) {
        new (p) T(std::forward<Args>(args)...);
    }
};
Puppy
  • 144,682
  • 38
  • 256
  • 465
  • 1
    And then when the de-allocator is invoked...? – Lightness Races in Orbit Sep 18 '14 at 21:41
  • 1
    If you want uninitialized objects, then dealing with the consequences of having them is your problem. It's a dumb idea to apply it to non-trivially-destructible types in the first place. – Puppy Sep 18 '14 at 21:45
  • 6
    Still, I don't see how you can call this "the optimal solution" in a teaching context when you don't even address the obvious flaw. – Lightness Races in Orbit Sep 18 '14 at 22:22
  • 1
    The obvious flaw of destructing a type which essentially has to be trivially constructible and destructible to even want to do this in the first place? – Puppy Sep 18 '14 at 23:00
  • Yes, exactly. Your answer doesn't even mention that. I find that to be distressingly irresponsible coming from a high-rep user. – Lightness Races in Orbit Sep 18 '14 at 23:07
  • @LightnessRacesinOrbit I think this is conceptually the best answer, for those complaining about the lack of handling non-trivially destructible objects you can actually have a `static_assert` in `template void construct(U*){ static_assert(std::is_trivially_destructible{}, "this allocator cannot be used with non trivial types"); }`. – alfC Sep 30 '17 at 05:16
  • @Puppy, there is a small detail remaining. In the implementations I have handy, `vector` will loop across the allocated memory and placement construct each individual object. Of course in this case, each construction will do nothing, but the loop is still there. Are you relying in that this loop will be optimized away? – alfC Sep 30 '17 at 05:27
  • @Puppy, For the record, according to my tests, the element "construction" loop inside `vector` seems to be optimized away completely for `-O2` both in `gcc` and `clang`. – alfC Sep 30 '17 at 06:07
  • 2
    [This answer to a related question](https://stackoverflow.com/a/21028912/427158) contains a safe to use default initializing allocator. – maxschlepzig Nov 03 '18 at 14:29
3

1 It seems that using std::vector is neither necessary not sensible in your situation. You only want some object to manage some raw memory for you. This can be easily achieved by

std::unique_ptr<void, void(*)(void*)> p(std::malloc(n), std::free);

2 If you really want to use std::vector<> you could use the trick described here.

Community
  • 1
  • 1
Walter
  • 44,150
  • 20
  • 113
  • 196
-2

How about using vector.reserve() to only allocate the storage but not initialize it?

Marmel
  • 7
  • 2
    Presumably because the C function requires a properly sized buffer, and writing to reserved (but not resized) data is undefined behaviour. – Konrad Rudolph Jun 10 '13 at 15:23
  • This is not bad, specially for buffers. The problem is that you loose the ability of tracking the "size" if the vector, because`.size()` will return `0` and `capacity()` can be anything equal or larger than `reserve`. – alfC Sep 30 '17 at 19:59
  • @alfC It *is* **very** bad because the memory at or beyond `data() + size()` is not writable or readable per the Standard and trying to do so has undefined behaviour. If it's not `< size()`, it's not a location in memory that you can do anything with, except use a proper member function to create an object there. – underscore_d Jun 17 '19 at 20:55
  • @underscore_d, you are right but, Should at least be writable at by `placement new`? I think that is the case if reserve is guarantied to allocate, which I don't know if it is the case. Furthermore if the type `is_trivially_assignable` then it is writable by assignment `=`. No? Just in practice? In any case this is in the realm of trying to hack a "typed-"buffer from `std::vector`. (A typed buffer makes sense only for trivial types.) So it is a hack of a hack at worst. – alfC Jun 17 '19 at 22:59
  • @alfC If the Standard doesn't say that you're allowed to do hacks-of-hacks with `.reserve()`d memory, then I would presume that you're not, on pain of extreme UB. You already identified that even that being *practically* possible would depend on an implementation detail that you're not sure is guaranteed. At that point, anyone with such questions should accept it's not a `vector` they need. – underscore_d Jul 09 '19 at 09:42