2

I am having some difficulty trying to reinterpret data to pull information out of messages. I've tried to recreate the problem here.

I am receiving a series of long integers (32 bits) by popping them off a stack. I need to assemble these into 4 word(16 byte) packets. The struct I recreated below resembles the first word of a given packet. The difficulty I am having is that in order to determine which word is the starting packet, as well as which type of packet is which I need to be able to read the octal value of the data in the s5 member of the struct.
Simply put, for each message, I need to interpret bits 16-31 as a 16 bit integer regardless if it crosses bit boundaries on other messages.

I would have thought this would be a much easier task, but I cannot seem to get it to work. Here is what I have tried. I'm just getting Null values.

struct S
{
    uint8_t s1  :8;
    short s2    :2;
    bool s3 :1;
    int s4  :5;
    uint16_t s5 :16;
};

int main() {
    S s;
    s.s1 = 3;
    s.s2 = 2;
    s.s3 = true;
    s.s4 = 1;
    s.s5 = 02050;
    long l;
    memcpy(&l, &s, sizeof(S));
    std::deque<long> d;
    d.push_back(l);
    cout << *((uint16_t*)(&d.front()+2)) <<endl;
BLUEPIXY
  • 39,699
  • 7
  • 33
  • 70
mreff555
  • 1,049
  • 1
  • 11
  • 21
  • 1
    You are copying something the size of 'S' into a long? A long isn't big enough to hold all that EDIT: my apologies, I misread that. Still I would check your structure packing and be sure that sizeof(S) == sizeof (long) – Joe Oct 23 '17 at 19:34
  • The longs on our system are 32 bits, or four bytes. – mreff555 Oct 23 '17 at 19:37
  • Yeah sorry I posted too quickly. But are you sure that sizeof(S) == sizeof(long)? – Joe Oct 23 '17 at 19:38
  • 1
    I did verify. S is 4 bytes in size. Good observation though. Had I used integers for s2 and s3 I would have ended up with a larger struct unless it was packed. – mreff555 Oct 23 '17 at 19:40
  • Also in line where you do 'd.front()+2' What are you trying to do exactly? – Joe Oct 23 '17 at 19:40
  • I am trying to advance the offset two bytes and resolve as a 16 bit int – mreff555 Oct 23 '17 at 19:43
  • 2
    The layout of bits in a bitfield is implementation defined. The fields might not even be in the same order as defined. – dbush Oct 23 '17 at 19:43
  • Ok, valid point. Like I said, I am trying to recreate the problem here. Can we make the assumption that the longs within the deque are defined by the sizes within the struct? My only actual concern is the interpretation in the cout statement. – mreff555 Oct 23 '17 at 19:45
  • `&d.front() + 2` will advance by 8 bytes because `d.front()` is a `long`. – alain Oct 23 '17 at 19:46
  • Interesting. I thought when working with memory addresses, '+2' would work on a byte level. Regardless I'm getting zero when I raise the value to 4. – mreff555 Oct 23 '17 at 19:55
  • Dan, I edited my answer below to include a new solution to your problem. – Daniel Trugman Oct 24 '17 at 06:20

2 Answers2

1

If you already have the long value from your stream, why not just use bit shifting?

Assuming the data is big-endian, you could just shift off the first 16 bits, to get your octal value:

// 69733891 is the big-endian integral value represented by
// your posted sample data, so the octal value should be 02050,
// or as an int 1064
long l = 69733891; 
uint16_t s5 = l >> 16; // shift off to get the high value (s5)

For little endian (as is implied in your post), you could use a bitwise AND:

uint16_t s5 = l & 0xFFFF;

And for a quick comparison of the assembly generated for a bit shift, versus a pointer alias, here's what GCC generated (no optimizations):

Assembly generated for bit shift (note that SAR is the single instruction to perform a right bit shift):

' uint16_t s5 = l >> 16;
mov    rax,QWORD PTR [rbp-0x18]
sar    rax,0x10
mov    WORD PTR [rbp-0x1a],ax

Assembly generated for pointer alias:

' uint16_t s5 = *((uint16_t*)&d[0]);
lea    rax,[rbp-0x20]
mov    esi,0x0
mov    rdi,rax
call   4e <main+0x4e>
movzx  eax,WORD PTR [rax] ' this is the "4e" address called
mov    WORD PTR [rbp-0x12],ax

Hope that can help.

txtechhelp
  • 6,625
  • 1
  • 30
  • 39
  • 1
    I think you meant bitwise-AND (`&`), not XOR (`^`). – Adrian McCarthy Oct 23 '17 at 20:17
  • @AdrianMcCarthy .. good catch :) Haven't had my second coffee for the day :) – txtechhelp Oct 23 '17 at 20:18
  • Shift operators don't depend on endianness, but right shifting a signed value will fill it with the sign bit. – Bob__ Oct 24 '17 at 06:29
  • @Bob__ .. correct about the endian-ness for shift operators .. but if the OP is pulling this "stream" from a network resource then endian-ness might play a role when the data is received (e.g. one system might differ, etc.) – txtechhelp Oct 24 '17 at 08:25
  • As far as I know, in most serialization processes, you transform a value into a sequence of byte/chars by right shifting and masking and then recostruct the value by left shifting and summing(or). It should work no matter what the endianness of the systems are, but you should care about the different sizes of the involed types. – Bob__ Oct 24 '17 at 08:33
  • I'm considering integer types only, BTW. – Bob__ Oct 24 '17 at 08:40
  • The message I am receiving and packing is a very large array of longs. My first thought, and what I ended up doing was bit shifting as you suggested. The main reason I was more interested in a casted pointer is that it seemed, if possible, to be a much less computationally expensive operation. – mreff555 Oct 24 '17 at 15:19
  • 2
    @DanFeerst .. Bit shifting is well less computationally complex than a pointer alias; I've updated my answer with the generated assembly to show this. Plus if the value you're grabbing needs additional reads due to alignment issues, you'll definitely have a performance impact with the extra memory pulls to get the full value. – txtechhelp Oct 24 '17 at 16:02
0

You are facing multiple issues here:

  • The bitfield packing is implementation defined
  • Reinterpreting your long as S* or S& violates the strict aliasing rule

If you stick with the long value, you either have to use assumptions about your compiler, e.g. endianess, bit-packing order, etc, or maybe disable string aliasing (which I wouldn't recommend).

Solution

If l was created by a memcpy from an instance of struct S, then copying the value back into a different instance of struct S should result in the exact same bit layout.

So, we can make a copy of the front object inside the deque into an instance of struct S and check it's s5 member:

long f = d.front();
S sf;
memcpy(&sf, &f, sizeof(sf));
std::cout << std::oct << sf.s5 << std::endl;
Daniel Trugman
  • 8,186
  • 20
  • 41
  • That is great advice, but like I said above, I created the structure simply to replicate the message. The actual way I am receiving this is by popping a long off a deque, so unfortunately, there are no members. – mreff555 Oct 23 '17 at 19:58
  • @DanFeerst, I understand you didn't try my solution with your longs, but it works as well. Updated answer – Daniel Trugman Oct 23 '17 at 20:01
  • 3
    This is undefined behavior, it violates the strict aliasing rule. – alain Oct 23 '17 at 20:02
  • With respect to the edits, can you reinterpret_cast an uninitialized value to a reference of a different type? And is the reinterpret_cast in the first code snippet implementation-defined because of the implementation-defined order of the bits in the bitfield? – Millie Smith Oct 23 '17 at 20:14
  • @MillieSmith, I fixed the answer. Is there anything unclear now? – Daniel Trugman Oct 23 '17 at 20:31
  • The union "trick" does not work either in C++, unfortunately (In C, it would). [This is a good topic about it](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule). It's not my DV, btw. – alain Oct 23 '17 at 20:44
  • @alain, I'm more concerned with what I write than votes :) Anyways, I actually did look at the topic you referred me to before editing my answer. It contains the following quote: _"Use a union. Most compilers support this without complaining about strict aliasing. This is allowed in C99 and explicitly allowed in C11."_ – Daniel Trugman Oct 23 '17 at 20:54
  • @DanielTrugman that's what he's saying. Works in C but not in C++. Disclaimer: I don't know and haven't read the link yet. – Millie Smith Oct 23 '17 at 20:58
  • [Here's a Q about C++ and unions](https://stackoverflow.com/questions/12297049/does-this-union-break-strict-aliasing-what-about-floating-point-registers). I haven't found a really popular one about it yet. – alain Oct 23 '17 at 21:04
  • @alain This answer appears to suggest that the behavior is defined and valid in c++: https://stackoverflow.com/a/7005988/2850543 – Millie Smith Oct 23 '17 at 21:10
  • @MillieSmith, thanks, added the reference to my answer – Daniel Trugman Oct 23 '17 at 21:14
  • @MillieSmith I still think it's not OK in C++, but it's not very clear. [More reading here](https://stackoverflow.com/questions/25664848/unions-and-type-punning) and [here](https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior). – alain Oct 23 '17 at 21:36
  • @alain, you are right, type punning could have a role here, when the value was set using a memcpy but read through the union. I updated the answer once more, and hopefully managed to explain the entire picture. – Daniel Trugman Oct 24 '17 at 06:17
  • Well, you're at the core of the problem now: The last quoted paragraph exists in the C standard, but not in the C++ standard. In one of the linked Q/As I read that GCC follows this C rule for C++ too, but this is a compiler-specific extension then. – alain Oct 24 '17 at 10:17
  • strictly by the standard, AFAIK, type punning by union is ok in `C` but not ok in `C++` – bolov Oct 24 '17 at 10:35
  • @alain, I did misread the post, saw C++11 instead of C11. I remove the additional section as I don't see a point in using a union if you have control over both the serialization and deserialization process. – Daniel Trugman Oct 24 '17 at 10:47
  • @bolov, thanks for your comment. I did misread one of the references I found. Removed the additional section as I see no point in it. – Daniel Trugman Oct 24 '17 at 10:47