0

I am using boost ASIO to send message over TCP stream. I send the body size first in strictly 4 bytes length. Than on server side I make a vector<char> which I resize to 4 bytes and I put the message body size there. Here is how I convert vector of char std::vector<char> size; to int:

_packet.body_size = static_cast<int>(_packet.size[0]);

This scenario works when the value which is kept inside _packet.size[0] is not bigger of 124.

enter image description here

And in this scenario works. body_size is set to 124 as you can see.

However if the value gets bigger than 124 like 128 for example I am not able to parse it correctly in the same way as I did with 124.

Take a look:

enter image description here

See to what number is set body_size. Why am I not able to convert bigger than 124 numbers?

Where is mistake and how can I fix it?

Venelin
  • 2,905
  • 7
  • 53
  • 117
  • 1
    signed `char` value range is from -128 to 127. Try `unsigned char` instead? – Yksisarvinen Oct 19 '20 at 12:06
  • You can't store 128 in a signed 8 bit integer (the usual default for `char`) – NathanOliver Oct 19 '20 at 12:07
  • 1
    now I am curious if you also tried `125`, `126` and `127` – 463035818_is_not_an_ai Oct 19 '20 at 12:09
  • If you want your `int` to be composed of all 4 `char` values in the vector, then your simple cast won't work (it just casts the value of the first element). You'll need something more devious, like: `_packet.body_size = *(reinterpret_cast(_packet.size.data()));`. I'm not (yet) recommending that, but can you clarify? – Adrian Mole Oct 19 '20 at 12:11
  • @AdrianMole it worked. Doing this `*(reinterpret_cast(_packet.size.data()));` seem to solved the problem completely. Why is that? Can you make an complete answer? – Venelin Oct 19 '20 at 12:16
  • @Venelin It will work *sometimes* on *some* platforms using *some* compilers. If I post just that as an answer, it will (justifiably) attract (lots of) downvotes. – Adrian Mole Oct 19 '20 at 12:18
  • Okay thank what can I do to make it work in other cases as well? Can you make such answer ? – Venelin Oct 19 '20 at 12:19
  • Use `unsigned char` for arbitrary binary data. `char` can be either signed or unsigned, and (as you have noticed) the signed version can cause strange surprises. – molbdnilo Oct 19 '20 at 12:38

2 Answers2

2

To construct a (4-byte/32-bit) integer from the 4 single-byte char values in your vector, and to do so safely, you will need to 'mask-in' each character into the relevant 8 bits of that integer.

You can do this using a combination of bitwise or (|) and bit-shift (<<) operations in a short loop:

#include <iostream>
#include <vector>

int main()
{
    std::vector<char> cVec = { -128, 0, 0, 0 };
    int32_t iVal = 0;
    for (size_t i = 0; i < 4; ++i) {
        iVal |= static_cast<int>(cVec[i]) << (i * 8);
    }
    std::cout << iVal << std::endl;
    return 0;
}

The above will work for systems that use Little-Endian byte sequences for integers (most common processors, like the Intel x86/x64 family, use this system). On Big-Endian systems, you will need to add the bytes in reverse order, using the following as the 'body' of the for loop:

        iVal |= static_cast<int>(cVec[3-i]) << (i * 8);

Note: It may be tempting to just cast the address of the vector's data to a pointer-to-int and then dereference that (as I suggested in the comments). However, this is unsafe and introduces undefined behaviour, as it violates the Strict Aliasing Rules of the C++ language.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • Instead of fixing endianness "by hand", I'd use ntohl() here - that makes it easier to read and more portable. @Venelin: for the record, in a network packet the order is big-endian by convention, so your packet size would read {0,0,0,124} rather than {124,0,0,0}. – Sander Oct 19 '20 at 15:20
  • @Sander it is Little-Endian in my case. Can you make an answer with `ntohl` ? – Venelin Oct 20 '20 at 06:25
  • On the sending system, simply create a new `int body_size_nbo = htonl(body_size);` and send out your frame with `body_size_nbo` rather than `body_size`. On the receiving side, read out the integer into `body_size_nbo` as described in the answers and then convert using `body_size = ntohl(body_size_nbo);`. That should make sure your code can be used on machines with any endianness. – Sander Oct 20 '20 at 11:31
  • @Sander The OP seems to have abandoned this question and reposted the same thing [here](https://stackoverflow.com/q/64439721/10871073) - borrowing some of the information the answerers have provided (though using that info incorrectly)! – Adrian Mole Oct 20 '20 at 11:33
2

To directly answer the question:

You've extracted just the first byte, then converted it to an integer. If the subsequent bytes have meaningful information, they are "lost".

What you intended was to reinterpret all four bytes as an integer.

That would look like this:

_packet.body_size = *reinterpret_cast<int*>(&_packet.size[0]);

However, the other answers are correct in that this is not safe. You cannot take a sequence of chars and pretend that an int object exists there. Contrary to popular belief, it's not "all just bytes". (Though it will often appear to work on your system, to be fair.)

The safe approach is std::memcpy:

assert(packet.size.size() >= sizeof(_packet.body_size));
std::memcpy(&_packet.body_size, &packet.size[0], sizeof(_packet.body_size));

… or std::copy:

assert(packet.size.size() >= sizeof(_packet.body_size));
std::copy(
   packet.size.begin(),
   packet.size.end(),
   static_cast<char*>(&_packet.body.size)
);

This works because the opposite conversion (pretending an int is a sequence of chars) is valid and safe.

You'd be better off, though, just bitmasking in the individual bytes to get endian-safety (and to ensure int size mismatches don't kill you!), as Adrian has already shown.

Asteroids With Wings
  • 17,071
  • 2
  • 21
  • 35
  • Without wishing to challenge your answer (it's good, as far as it goes) ... but I'm curious as to what else other than *"all just bytes"* would be in the vector's data? ‎ – Adrian Mole Oct 19 '20 at 12:52
  • Thanks for your answer @Asteroids With Wings. Can you please show an example about the better approach of `memcpy/std::copy` so the answer can provide the better approach as well ? – Venelin Oct 19 '20 at 13:15
  • @Venelin There ya go (untested) – Asteroids With Wings Oct 19 '20 at 13:35
  • @AsteroidsWithWings so are memcpy and std::copy any good than or Adrian's approach is safer ? – Venelin Oct 19 '20 at 13:44
  • Yes, like I said, you'd be better off with Adrian's approach. Though, if you _know_ that your `int` size and endianness is the same as on the platform that produced the vector in the first place, a quick `copy` is easier... – Asteroids With Wings Oct 19 '20 at 13:51
  • Well the size is prefixed by FlatBuffers. (https://google.github.io/flatbuffers/class_flat_buffers_1_1_flat_buffer_builder.html#a92de6a8a35e1ae5a07f5578bb0fda16c) and it's exactly 32bit which by my calculations is exactly 4 bytes. So In this case I believe it's fine – Venelin Oct 19 '20 at 13:55
  • Okay, according to the FlatBuffers documentation, the data is always little-endian, so **if** your target platform is little-endian then the copy will work fine. Doesn't the library have a built-in way to do this, though? – Asteroids With Wings Oct 19 '20 at 13:59
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/223334/discussion-between-venelin-and-asteroids-with-wings). – Venelin Oct 20 '20 at 05:46