3

Many languages and frameworks offer a "byte array" type, but the C++ standard library does not. What type is appropriate to use for medium-sized1, resizable byte arrays and how can I use that type efficiently? (particularly: allocating, passing as parameters and destroying)


1: By medium-sized I mean less than a 100 MB.

Tamás Szelei
  • 23,169
  • 18
  • 105
  • 180

6 Answers6

5

You can use std::vector<unsigned char>, or as @Oli suggested std::vector<uint8_t>. Yes, you can pass around it, without copying the whole contents.

void f(std::vector<unsigned char> & byteArray) //pass by reference : no copy!
{
     //
}

std::vector<unsigned char>  byteArray;
//...
f(byteArray); //no copying is being made!
Nawaz
  • 353,942
  • 115
  • 666
  • 851
3

Many languages and frameworks offer a "byte array" type, but the C++ standard library does not.

You're wrong here, C++ has a byte array type: std::vector<unsigned char>, whose storage is guaranteed to be continuous (there are other alternatives if you do not need this condition). You may want to read about references, move semantics, return value optimization and copy elision to know how to deal with those effectively.

Note: a byte, in C++ speak, is a char (either signed or unsigned). It may not be 8 bits long, you can get its size in bits via the CHAR_BITS macro.

Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
  • Note that `char` is not really a reliable _byte_ since it is either signed or unsigned IIRC. – D.Shawley Apr 09 '12 at 18:26
  • @D.Shawley: Yes, you don't know whether it is signed or unsigned. Better to have it unsigned if you want to make sense of the numbers as something modulo 2^n. – Alexandre C. Apr 09 '12 at 18:28
0

I would recommend using std::deque<uint8_t> instead of std::vector<uint8_t> since the latter requires contiguous chunks of memory. I would steer clear of allocating large blocks of memory with new since it will initialize the block of memory using the default constructor which might be a little more expensive than you want.

In a pinch, I believe that you can customize boost::shared_ptr with a custom deallocator so that you can allocate with std::malloc avoiding the initialization overhead and deallocate with std::free while still maintaining the goodness that shared_ptr brings to the table.

D.Shawley
  • 58,213
  • 10
  • 98
  • 113
  • 3
    "*I would steer clear of allocating large blocks of memory with `new` since it will initialize the block of memory using the default constructor which might be a little more expensive than you want.*" If you're talking about raw C-arrays, whether the data is initialized or not depends on whether one uses default-initialization or value-initialization (i.e. `new unsigned char[num]` vs `new unsigned char[num]()`). The former is _not_ guaranteed to initialize the memory, the latter is. `malloc` is never a good suggestion. – ildjarn Apr 09 '12 at 19:30
  • IIRC, `std::vector` uses the initializing version. It probably does not matter a lot but there is quite a difference in the performance of the options. For example, allocating a 100,000,000 `vector` takes 177ms whereas `new`ing it w/initialization takes 253ms, w/o initialization takes 0.051ms and `malloc` takes 0.030ms. I ran into this when I was using `std::vector` for largish protocol buffers. – D.Shawley Apr 09 '12 at 20:40
  • `std::vector<>` doesn't use `new` at all, it uses the allocator you specify as the second template argument. You're right that it will initialize elements if using `resize` or the constructor that takes a size, but this is what `reserve` is for. :-] – ildjarn Apr 09 '12 at 20:49
0

vector<char> should be fine for your purposes. If you want a shared version to avoid copying you can use the following:

typedef shared_ptr<vector<uint8_t>> ByteArray;

if you know the size at compile-time you can use array which is slightly more space efficient.

also string can handle null characters which may or may not be more appropriate than vector.

Some extended implementations have a rope implementation http://en.wikipedia.org/wiki/Rope, http://www.aoc.nrao.edu/php/tjuerges/ALMA/STL/html-3.4.6/rope.html, that may be more appropriate.

Andrew Tomazos
  • 66,139
  • 40
  • 186
  • 319
  • Is it even legal for an implementation to share `std::vector<>` contents? I know there are special permissions for `std::string`, but there are [known problems with COW semantics for strings](http://www.sgi.com/tech/stl/string_discussion.html). And how do you know that "most implementations" actually implement this? – André Caron Apr 09 '12 at 18:32
  • @AndréCaron: I thought I saw it in gcc and msvc. Hang on I'll check (gcc at least). – Andrew Tomazos Apr 09 '12 at 18:34
  • @AndréCaron: `std::string` is COW on gcc 4.7. Checking vector. – Andrew Tomazos Apr 09 '12 at 18:37
  • @AndréCaron: Ok vector copy constructor calls uninitialized_copy_a. Must have misremembered. Will update answer. – Andrew Tomazos Apr 09 '12 at 18:41
0

There are performance reasons for using unique_ptr instead, at least for relatively large buffers. See https://stackoverflow.com/a/35798248/1992615 for details.

Community
  • 1
  • 1
Nanki
  • 798
  • 7
  • 9
0

Since C++17 you can use

std::vector<std::byte> buffer;
HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207