5

What I want to do is store the data in a std::vector<short> in a std::vector<uint8_t>, splitting each short into two uint8_t values. I need to do this because I have a network application that will only send std::vector<uint8_t>'s, so I need to convert to uint8_t to send and then convert back when I receive the uint8_t vector.

Normally what i would do (and what I saw when I looked up the problem) is:

std::vector<uint8_t> newVec(oldvec.begin(),oldvec.end());

However, if i understand correctly this will take each individual short value, truncate to the size of a uint8_t, and make a new vector of half the amount of data and the same number of entries, when what i want is the same amount of data with twice as many entries.

solutions that include a way to reverse the process and that avoid copying as much as possible would help a lot. Thanks!

Yunnosch
  • 26,130
  • 9
  • 42
  • 54
S. Casey
  • 167
  • 2
  • 8
  • 3
    Just `memcpy` data() from one to another? – user7860670 Aug 21 '17 at 21:27
  • 4
    Not bad, @VTT , but leaves room for some endian weirds. – user4581301 Aug 21 '17 at 21:31
  • 1
    Take some time to proofread what you wrote. Phrases like `store the data in a std::vector in a std::vector` are unlikely to engender sympathy. – Mikhail Aug 21 '17 at 21:31
  • 1
    @Mikhail the relevant difference was hidden inside angulars. I edited to make visible. – Yunnosch Aug 21 '17 at 21:33
  • Can you explain how you want them split exactly? For example, if the original `vector` contains a single 1, what should the output `vector` contain? – David Schwartz Aug 21 '17 at 22:32
  • 1
    @DavidSchwartz I'm going to be using big endian, so the short 0000000000000001 would be stored as two entries in the uint8_t vector, 00000000 and 00000001. Theoretically I shouldn't have to deal with endian issues since the code is always being run on the same computers and I have control over both the serialization and deserialization, but I'm trying to make it as portable as I can. I'm not sure how I could rebuild the shorts correctly on a machine with different endianness without knowing the endianness first. have an if statement on rebuild that checks endianness? – S. Casey Aug 22 '17 at 14:05
  • 1
    @S.Casey You just need to put the data on the wire in network byte order (which is in fact big endian). At client or server side you can always use the `htonx()` `ntohx()` functions to convert from host machine byte ordering to network byte order and vice versa without need to know the endianess of the host machine. – user0042 Aug 22 '17 at 14:10

5 Answers5

6

to split something at the 8 bit boundary, you can use right shifts and masks, i.e.

uint16_t val;
uint8_t low = val & 0xFF;
uint8_t high = (val >> 8) & 0xFF;

now you can put your high and low into the second vector in your order.

Serge
  • 11,616
  • 3
  • 18
  • 28
3

For splitting and merging, you would have the following:

unsigned short oldShort;
uint8_t char1 = oldShort & 0xFF; // lower byte
uint8_t char2 = oldShort >> 8; // upper byte

Then push the two parts onto the vector, and send it off to your network library. On the receiving end, during re-assembly, you would read the next two bytes off of the vector and combine them back into the short.

Note: Make sure that there are an even number of elements on the received vector such that you didn't obtain corrupted/modified data during transit.

// Read off the next two characters and merge them again
unsigned short mergedShort = (char2 << 8) | char1;
WindyFields
  • 2,697
  • 1
  • 18
  • 21
Jeff Geisperger
  • 583
  • 4
  • 17
2

I need to do this because I have a network application1 that will only send std::vector's

Besides masking and bit shifting you should take endianess into account when sending stuff over the wire.

The network representation of data is usually big endian. So you can always put the MSB first. Provide a simple function like:

std::vector<uint8_t> networkSerialize(const std::vector<uint16_t>& input) {
    std::vector<uint8_t> output;
    output.reserve(input.size() * sizeof(uint16_t)); // Pre-allocate for sake of
                                                     // performance
    for(auto snumber : input) {
        output.push_back((snumber & 0xFF00) >> 8); // Extract the MSB
        output.push_back((snumber & 0xFF)); // Extract the LSB
    }
    return output;
}

and use it like

std::vector<uint8_t> newVec = networkSerialize(oldvec);

See live demo.


1)Emphasis mine

user0042
  • 7,917
  • 3
  • 24
  • 39
1

Disclaimer: People are talking about "network byte order". If you send something huger than 1 byte, of course you need to take network endiannes into account. However, as far as I understand the limitation "network application that will only send std::vector<uint8_t>" explicitly states that "I don't want to mess with any of that endianness stuff". uint8_t is already a one byte and if you send a sequence of bytes in an one order, you should get them back in the exactly same order. This is helpful: sending the array through a socket.
There can be different system endianness on client and server machines but OP said nothing about it so that is a different story...

Regarding the answer: Assuming all "endianness" questions are closed. If you just want to send a vector of shorts, I believe, VTT`s answer will perform the best. However, if std::vector<short> is just a particular case, you can use pack() function from my answer to a similar question. It packs any iterable container, string, C-string and more... into a vector of bytes and does not perform any endiannes shenanigans.
Just include byte_pack.h and then you can use it like this:

#include "byte_pack.h"

void cout_bytes(const std::vector<std::uint8_t>& bytes)
{
    for(unsigned byte : bytes) {
        std::cout << "0x" << std::setfill('0') << std::setw(2) << std::hex
                   << byte << " ";
    }
    std::cout << std::endl;
}


int main()
{
    std::vector<short> test = { (short) 0xaabb, (short) 0xccdd };
    std::vector<std::uint8_t> test_result = pack(test);

    cout_bytes(test_result); // -> 0xbb 0xaa 0xdd 0xcc (remember of endianness)

    return 0;
}
WindyFields
  • 2,697
  • 1
  • 18
  • 21
0

Just copy everything in one go:

::std::vector<short> shorts;
// populate shorts... 
::std::vector<uint8_t> bytes;
::std::size_t const bytes_count(shorts.size() * sizeof(short) / sizeof(uint8_t));
bytes.resize(bytes_count);
::memcpy(bytes.data(), shorts.data(), bytes_count);
user7860670
  • 35,849
  • 4
  • 58
  • 84
  • 2
    _"Notice that this approach does not deal with Endianness in any way since it was not mentioned in the question. "_ Mentioning **network** transport implies that. – user0042 Aug 21 '17 at 22:07
  • @user0042 No, just mentioning **network** transport does not imply that in any way. Moreover, **network** does not imply that data should be transferred in network or any other particular byte order like other posts imply. And it does not imply that application is running on Little Endian platform either. So byte order flips suggested in other questions could actually screw up original Big Endian order. – user7860670 Aug 21 '17 at 22:17
  • You can always go back through the `vector` afterwards to apply endian transformations if needed, eg: `for (int i = 0; i < bytes.size(); i += sizeof(short)) { short *s = reinterpret_cast(&bytes[i]); *s = htons(*s); }` – Remy Lebeau Aug 21 '17 at 23:18
  • Can someone explain me in which way this answer is bad? How possible big endianness internal for a given connection can change the output on the other end of the wire? If you send "0x1, 0x2, 0x3, 0x4" with any normal network connection you will receive "0x1, 0x2, 0x3, 0x4" am I wrong? – WindyFields Aug 22 '17 at 08:22
  • @WindyFields This answer is not bad as it answers the given question without taking a wild guesses about endianness handling of underlying protocol and application requirements. The problem with endianness can happen for example if sender machine is Little Endian and receiver machine is Big Endian. Sending 1234 short will be received as 3364. So to transmit binary data between these two machines programmer will need to define how endianness is handled in data exchange protocol that he is implementing. However "send everything in network byte order" is just one of the possible approaches. – user7860670 Aug 22 '17 at 08:35
  • I understand that, but OP did not asked to send short - he wants to send bytes. How big endiannes can change the order of bytes he sends? I upvoted your answer btw. – WindyFields Aug 22 '17 at 08:41
  • @WindyFields OP actually tries to send a vector of shorts but he needs to store it in vector of bytes because actual send method accepts only vectors of bytes. – user7860670 Aug 22 '17 at 08:47
  • @VTT, sure, I just want to say, that because the connection sends a pure sequence of bytes it do not change their order. – WindyFields Aug 22 '17 at 09:03
  • @WindyFields Yes, sending sequence of bytes will not change their order anyhow. However if source and destination machine endianness differs then extracting shorts from the same sequence of bytes will produce different results. For example you send 1234 short from LE machine, it will send 0xD2 0x04, however the same 0xD2 0x04 sequence on BE machine will be interpreted as 53764 short (note that I had a typo in my earlier reply). – user7860670 Aug 22 '17 at 09:12