1

E.g given vector<uint8> of length 100, how to create new vector<uint16> of 50 elements. Preferably without copying?

(Edit: info from my comments)

To illustrate:

I have a uint16 grayscale image file, my 3rd party lib returns a vector of uint8. Every 2 bytes = 1 pixel. It is practical for me to work with a vector of uint16. I think the only difference between this vector<uint8> and a corresponding vector<uint16> is that the bytes are read in a "concatenated" manner (i.e. chunks of 2 bytes = 1 value).

I could loop and combine every 2 elements into a new vector element, but that seems inefficient, since the memory layout is the same. I was hoping I could combine some cast and maybe a move constructor to create a vector<uint16> --without copying the original vector<uint8> bytes again.

Edit 2: To dispel any possible misunderstandings I drew a picture, forgive my poor ascii art :)

container of uint8 values in memory:

[ _ ] | [ _ ] | [ _ ] | [ _ ] ...
|^^|
accessing element = accessing 1 byte

container of uint16 values in memory is also just a sequence of bytes;

[ _ ] | [ _ ] | [ _ ] | [ _ ] ...

|^ ^ ^ ^ ^|
accessing element = accessing 2 bytes (lets say my system read this as big-endian)

I already have the sequence of bytes in vector v1. I just want a v2 of different type so I can access them differently. If it turns out the uint16 values are read as little-endian I could still work with that.

Edit 3: So far it seems the answer by black is the best to my understanding (I will accept later if nothing changes), it still seems odd that there is no simple or STL solution to this. Though thanks to everyone for their prompt input and patience with my attempts at explaining myself.

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
jiggunjer
  • 1,905
  • 1
  • 19
  • 18
  • My first try would be to just cast it.. Doesn't this work? – frarugi87 Jul 15 '15 at 10:29
  • Cast how? There's no implicit conversion, and you can't just reinterpret it. – MSalters Jul 15 '15 at 10:44
  • Are you sure doing it without copying is preferable? Judging from the seemingly speculative nature of the question, it sounds like you don't actually have any information indicating that it's worth doing complicated things to avoid copying. –  Jul 15 '15 at 11:27
  • @ Hurkyl I added some context, does it convince you that copying may not be preferable? – jiggunjer Jul 15 '15 at 11:38
  • @jiggunjer: It actually, IMO, argues that you shouldn't be worrying about the copy. Your description sounds like you're trying to optimize an inconsequential part of your program. Also, having a `vector` return value suggests to me that the authors of the third party library also believe doing the memory juggling will be inconsequential -- e.g. because the work to obtain the elements of the vector, or the work that will be done on it, vastly outweigh any costs of doing extra copies. When I write functions where that isn't true, you pass memory into them, not them returning memory to you. –  Jul 15 '15 at 11:44
  • There is no STL way to do this, because it would be way too much for the C++ standard to insist that what you're trying to do is meaningful. At the very least, if your `uint8_t` array has elements `0,1,2,3,4,...`, then half of computers would, when reinterpreting it as `uint16_t`, want the array to be `0x0001, 0x0203, 0x0405, ...` and the other half want `0x0100, 0x0302, 0x0504, ...`. And we haven't even gotten into all the things compilers want to do to make your code run fast that they could not do if the standard insisted these sorts of things were meaningful. –  Jul 15 '15 at 14:55

4 Answers4

4

As you don't control the source (per your comment), you can't know that the input vector has a 2-byte aligned buffer. For that reason alone, you have to copy the input vector.

How you do it won't matter much; memory access speed probably dominates the run time. However, do call reserve(50) on the destination vector - having multiple allocations will slow down the program.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Is it possible to statically assert that the contents of the vectors are aligned properly? – Aaron McDaid Jul 15 '15 at 10:55
  • On some (possibly possible) system this may be an issue, but on most of the system vector data should be allocated with `new` (the way default allocator does) which is suitably aligned. Or one can pass custom allocator (forces alignment). What are your views about this? Is it safe to reinterpret `std::vector::data` with/without custom allocator. (Ignoring endian-ness) – Mohit Jain Jul 15 '15 at 11:01
  • @MohitJain: Well, `vector< >` may not use `new` directly but must use `std::allocator`. And that may subdivide allocations. Thus, after one allocation of an odd number of bytes, subsequent allocations could very well return a byte-aligned allocation. (Keep in mind that the odd allocation could be somewhere in `std::string`) – MSalters Jul 15 '15 at 11:07
  • I thought the fact that memory in a vector is continuous/contiguous was good enough. I'll have to read up on vector byte alignment... – jiggunjer Jul 15 '15 at 11:27
1

You can write a wrapper to do the conversion for you when needed. For example (without template)

static inline uint16 getElement(const vector<uint8> &p, size_t index) {
  const int idx = index * 2;
  return p[idx] | p[idx + 1] << 8;
}
Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
  • I already considered casting a uint16 pointer to v.data(). But I was specifically hoping to use std::vector. – jiggunjer Jul 15 '15 at 10:34
  • @jiggunjer Can you please clarify a little more? – Mohit Jain Jul 15 '15 at 10:35
  • Yes but you can't read from a `uint16_t*` from `uint8_t*` because you break the strict-aliasing rule. – edmz Jul 15 '15 at 10:38
  • I have a uint16 image file, my 3rd party lib returns a vector of uint8. Every 2 byte = 1 pixel. I want to work with a vector of uint16 by converting the returned vec. – jiggunjer Jul 15 '15 at 10:42
  • I could loop and sum every 2 elements into a new vector, but that seems inefficient, since the memory layout is the same. – jiggunjer Jul 15 '15 at 10:44
  • @jiggunjer If you don't want to reinterpret the `.data`, why don't you index directly from `uin8 vector` and convert it as and when needed. – Mohit Jain Jul 15 '15 at 10:48
  • @jiggunjer:Using `vector` would be entirely Undefined Behavior. For instance, you have no idea if it stores the length in bytes or elements, but you assume it stores the length in bytes (as you expect a vector of 50*2 bytes). There could also be alignment issues (as `vector` unlike `malloc` only has to align to 1 byte) or even worse, template specializations (so `vector` could be utterly unrelated to any other vector). – MSalters Jul 15 '15 at 10:48
  • No idea what this is good for. But if you think you need such a trash, you have a very big design problem which you should solve before implementing the trash! – Klaus Jul 15 '15 at 10:48
  • @Mohit jain because I thought that would add extra overhead or complexity. – jiggunjer Jul 15 '15 at 10:51
  • 1
    In addition: What is with byte order / endianness? – Klaus Jul 15 '15 at 10:56
0

You would have to typecast, and I don't think you do it without copying in this case.

Vectors are a contiguous block of memory, and uint8 will result in all numbers occupying 8 bits and next to each other in memory. Now when you cast it to a uint16, every number will require an additional 8 bits, and you can't magically insert memory in-between a contiguous block. So, copying will occur.

Without the constraint of not copying, the problem is not hard, and I'm sure you can manage.

EDIT: In response to the comment, even if the vector has space for 100 elements, but has less than 50 elements in it, typecasting will still require copying. The first two elements will occupy 16 bits, whereas later on the first element will be occupying those 16 bits. As a result, you have to at least copy the second element. In a similar logic, you will have to copy other elements.

therainmaker
  • 4,253
  • 1
  • 22
  • 41
0

vector<std::uint8_t> will hold a std::uint8_t* whilst vector<std::uint16_t> a std::uint16_t*. What you want to basically do is share the two pointers and give different interpretations, something like

auto ptr = reinterpret_cast<std::uint16_t*>( vectorOfUint8.data() )

This is fine as long as you don't read/write through that pointer because such operation will cause undefined behavior due the strict-aliasing rule. You need to copy, which can be optimized to be very efficient with SIMD. If your compiler can't do it automatically, you can use intrinsics or unroll it:

  1. Read two elements from v1 to v2 so that they get 96 (48 in v2) manually
  2. Loop over v2 and read 4 (i += 48 / 4 = 12) or 8 (i += 48 / 8 = 6) elements at time

You could also disable the strict-aliasing rule, though that's compiler-specific and hence not standard and portable.

Community
  • 1
  • 1
edmz
  • 8,220
  • 2
  • 26
  • 45
  • `undefined behavior due the strict-aliasing rule` That actually depends, `uint8_t` is usually a typedef for `unsigned char`, if this is the case, strict aliasing does not apply. – sbabbi Jul 15 '15 at 11:07
  • It still does: if you read _from_ a char* to whatever other type it's ok, but not the opposite. – edmz Jul 15 '15 at 11:13
  • hmmm a pointer that can't be used to read or write is a bit useless. So the normal workaround is to just copy the data...Also I wouldn't mind if v1 became useless after creating v2 (was thinking move semantic) – jiggunjer Jul 15 '15 at 15:03