0

Hi I'm working on socket translation between two protocols. I read from a binary file and store the parsed header into an array of type uint32_t. Then I grab the fields out of the array and convert them into respective types. So far uint32_t/int32_t/uint16_t to int32_t works fine.

However, I get all kinds of wrong outputs when trying to combine two uint32_t (append one after the other) and then converting this 64bit long data into a double.

Being a newbie to C programming, I'm struggling with the computer methodology of double / float representation.

Basically what I want to do is: without altering the bit pattern of the two uint32_t, concast concatenate one after the other to make a 64-bit data, then convert the data as a double. The most important thing is not to alter the bit pattern as that part of the bit stream is supposed to be a double.

The following is part of the code:

uint32_t* buffer = (uint32_t*) malloc (arraySize * sizeof(uint32_t));

...

double outputSampleRate = ((union { double i; double outputSampleRate; })
    { .i = ((uint64_t)buffer[6] << 32 | (uint64_t)buffer[7])}).outputSampleRate;

Data in input file:

35.5

value after my code:

4630192998146113536.000000

Also, is there a better way to handle the socket header parsing?

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
KKsan
  • 147
  • 1
  • 1
  • 8
  • 2
    "I get all kinds of wrong outputs " --> what was your input, output and expected output. – chux - Reinstate Monica Jul 26 '16 at 22:13
  • 1
    Not all bit patterns possible with 2 `uint32_t` may make for a valid/distinct `double`. It depends on `double` format and potential not-a-numbers. – chux - Reinstate Monica Jul 26 '16 at 22:15
  • 1
    "is there a better way to handle the socket header parsing?" Post the code you have tried to get good feedback, else this is just too broad. – chux - Reinstate Monica Jul 26 '16 at 22:16
  • You're running up against padding problems. First, double-check that your uint32 -> uint64 conversion is working. – TLW Jul 26 '16 at 22:16
  • Looks like just another version of undefined behaviour. See [ask] and provide a [mcve]. – too honest for this site Jul 26 '16 at 22:28
  • @olaf This is implementation defined territory, not UB. – Dietrich Epp Jul 26 '16 at 22:44
  • @DietrichEpp: I seem to have missread something. Not sure why OP uses a `union` anyway. For conversion it is absolutely useless to have both members with the same type. – too honest for this site Jul 26 '16 at 22:48
  • You can't take a uint64_t and assign it to a double because the computer will change the bit pattern - what you have done is no different than saying `uint64_t j = 4630192998146113536LL; double i = j;` because both items in your union are doubles. The way I used to do it was to set up a uint64_t variable and static cast it's pointer to a double* and then dereference it: `uint64_t j = 4630192998146113536; double i = *(double*)&j;` – Jerry Jeremiah Jul 26 '16 at 22:48
  • That is a bad approach. Define a proper transfer protocol and use proper conversion functions e.g. using shift/masking or - better for debugging, etc. - convert to/from a textual format. Note that your use of the term "protocol" does not sound correct. There are no two "protocols" you use the sockets between, but two sockets you use a **single** and well-defined protocol. Using well established terms wrongly is a guarantee for confusion. – too honest for this site Jul 26 '16 at 22:50
  • Note: As an integer `4630192998146113536` --> `0x4041C00000000000` which happens to be the bit pattern for 35.5 in IEEE-754 Floating-Point. See [here](http://babbage.cs.qc.cuny.edu/IEEE-754.old/64bit.html). – chux - Reinstate Monica Jul 26 '16 at 23:00
  • 1
    @olaf The suggestion to use a textual protocol—while its often a good idea, it is going too far to say this is a "bad approach". Modern computers use IEEE 64-bit doubles almost exclusively, and using a union is one of only two ways to reinterpret bit patterns in C. Serializing floating point numbers by writing them directly to disk or network is so mind-numbingly common that I'm surprised that this is even in dispute here. Again, text protocols are great, but it's inappropriate to say that a binary protocol is a "bad approach" without more fully understanding the problem it is trying to solve. – Dietrich Epp Jul 26 '16 at 23:08
  • 1
    @DietrichEpp: I did not state a binary approach is bad as such! Just the way OP leaves issues like endianess to the implementation. I'm the first to accept a well-defined binary protocol. And IEEE floats are not that universally used as you might think. In embedded systems (keyword: IoT), there are a lot of systems with other encodings or - at best - partial implementation of the IEEE format, especially `double` (ARM-Cortex-M4/7F e.g. supports 32 bit `float` only in hardware). – too honest for this site Jul 26 '16 at 23:12
  • Thank you all for these inputs! Definitely helpful! Sorry for being newbie :( – KKsan Jul 27 '16 at 02:11
  • @Olaf: That's an excellent comment, which clearly expresses potential issues with OP's approach. – Dietrich Epp Jul 27 '16 at 13:14

2 Answers2

1

Your union definition is incorrect, you want i to be defined as uint64_t:

double outputSampleRate = ((union { uint64_t i; double d; })
    { .i = ((uint64_t)buffer[6] << 32 | (uint64_t)buffer[7])}).d;

You might also be running into an endianness issue. Try little endian:

double outputSampleRate = ((union { unt64_t i; double d; })
    { .i = ((uint64_t)buffer[7] << 32) | (uint64_t)buffer[6]}).d;

Reinterpreting the bits of the representation via a union is actually supported by the C Standard and is known as type punning. It is not guaranteed to work if the bits represent a trap value for the destination type.

You could try other casts and tricks: test your luck and use a pointer cast:

double outputSampleRate = *(uint64_t*)&buffer[6];

Another way to force type punning is to use the memcpy function:

double outputSampleRate;
uint64_t temp = ((uint64_t)buffer[7] << 32) | (uint64_t)buffer[6];
memcpy(&outputSampleRate, &temp, sizeof(outputSampleRate));

Or simply:

double outputSampleRate;
memcpy(&outputSampleRate, &buffer[6], sizeof(outputSampleRate));

But it does not seem guaranteed to work either, although I have seen some instances of both of the above in production code.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Not in the mood to check, but there is something in the standard (or a defect report) that the `union` approach very well is guaranteed to work. The `memcpy` approach OTOH violates the effective type rule like a cast. – too honest for this site Jul 26 '16 at 23:17
  • I found it: http://stackoverflow.com/questions/11373203/accessing-inactive-union-member-undefined-behavior – Jerry Jeremiah Jul 26 '16 at 23:27
  • @Olaf: I am surprised `memcpy` violates the effective type rule: it uses a `void*` and could be implemented as reading and writing `unsigned char*` which are an special case IIRC. – chqrlie Jul 26 '16 at 23:45
  • @chqrlie: I'm not. Because `memcpy` is the same as the cast/assignment basically. See 6.5p6++. There's still the union left (6.5.2.3, footnote 95). – too honest for this site Jul 26 '16 at 23:51
  • @Olaf: so you are referring to 6.5p6 on effective type and copying with `memcpy`. This paragraph is quite obscure to me. – chqrlie Jul 26 '16 at 23:51
  • And p7. And yes, at least to me as ESL, it really is hard to comprehend in all its aspects. But I hopefully figured it out now (with a lot of discussions here over the last year or so). It is imo one of the most relevant sections in the standard, too. – too honest for this site Jul 26 '16 at 23:54
  • @Olaf: English is actually my fourth language, but somehow I don't feel ashamed to admit I cannot make sense of paragraph 6. These are the relevant sections, but I'm not relieved. – chqrlie Jul 27 '16 at 00:29
1

Reinterpreting bit patterns through a union requires that the union elements have the right type. Your union has two doubles, so when you read from one, it will have the same value as the other. The conversion from uint32_t to double will be one that preserves numeric results, explaining the "garbage", which is really just the double reinterpreted as an integer. You will also need to use the correct byte order (low word first? high word first?) and the easiest way to do this is by avoiding bit shifting altogether.

double outputSampleRate = ((union { uint32_t i[2]; double d; })
    { .i = { buffer[6], buffer[7] } }).d;

You could use uint64_t i but... why bother?

You could also use memcpy() to copy the bytes...

double outputSampleRate;
memcpy(&outputSampleRate, &buffer[6], sizeof(outputSampleRate));

The usual caveats apply: while these solutions are relatively portable, they do not take endian issues into account, and they will not work on systems that violate your assumptions about e.g. how big a double is, but it is generally safe to make these assumptions.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415