2

I want to split large variables like floats into byte segments and send these serially byte by byte via UART. I'm using C/C++.

One method could be to deepcopy the value I want to send to a union and then send it. I think that would be 100% safe but slow. The union would look like this:

   union mySendUnion
   {
       mySendType sendVal;
       char[sizeof(mySendType)] sendArray; 
    }

Another option could be to cast the pointer to the value I want to send, into a pointer to a particular union. Is this still safe?

The third option could be to cast the pointer to the value I want to send to a char, and then increment a pointer like this:

            sendType myValue = 443.2;

    char* sendChar = (char*)myValue; 

    for(char i=0; i< sizeof(sendType) ; i++)
    {
        Serial.write(*(sendChar+j), 1);
    }

I've had succes with the above pointer arithmetics, but I'm not sure if it's safe under all circumstances. My concern is, what if we for instance is using a 32 bit processor and want to send a float. The compiler choose to store this 32 bit float into one memory cell, but does only store one single char into each 32 bit cell.

Each counter increment would then make the program pointer increment one whole memory cell, and we would miss the float.

Is there something in the C standard that prevents this, or could this be an issue with a certain compiler?

user3050215
  • 185
  • 1
  • 5
  • 14
  • 2
    Using unions for type-punning will result in undefined behavior in C++ and should not be considered _safe_. – Captain Obvlious May 01 '14 at 19:42
  • Neither solution is safe, both can and will fail. They just happen to work often enough (most of the time) that people keep doing it...The typcast is a little safer as it doesnt have the alignment problems – old_timer May 01 '14 at 19:44
  • 1
    Why do you think the union will be slow? – Barmar May 01 '14 at 19:44
  • Make sure to account for endianness; not all platforms store the four bytes that make up an (e.g.) 32-bit float in the same order, so you need to make sure both ends of the connection agree on a representation. – dlf May 01 '14 at 19:45
  • @dwelch What do you mean by "typecast is safer as it doesn't have alignment problems"? Either both solutions are wrong (in C++ if the types are different) or the both are good (in C99 and later, when the types are compatible or the referenced type of the aliasing pointer is `[un]signed char`) or only the union based solution is good (in C, when the types differ but the punning pointer is not `char *`) or only the pointer-based solution is good (in C++, when the types are incompatible and the referenced type of the aliasing pointer is `[un]signed char`). – The Paramagnetic Croissant May 01 '14 at 19:56
  • the reason why you cant use unions like this is because the compiler can choose to align and pad the items as it sees fit. This topic has been discussed to death here at SO... – old_timer May 01 '14 at 20:19

3 Answers3

5

First off, you can't write your code in "C/C++". There's no such language as "C/C++", as they are fundamentally different languages. As such, the answer regarding unions differs radically.

As to the title:

Are casts as safe as unions?

No, generally they aren't, because of the strict aliasing rule. That is, if you type-pun a pointer of one certain type with a pointer to an incompatible type, it will result in undefined behavior. The only exception to this rule is when you read or manipulate the byte-wise representation of an object by aliasing it through a pointer to (signed or unsigned) char. As in your case.

Unions, however, are quite different bastards. Type punning via copying to and reading from unions is permitted in C99 and later, but results in undefined behavior in C89 and all versions of C++.

In one direction, you can also safely type pun (in C99 and later) using a pointer to union, if you have the original union as an actual object. Like this:

union p {
    char c[sizeof(float)];
    float f;
} pun;
union p *punPtr = &pun;

punPtr->f = 3.14;
send_bytes(punPtr->c, sizeof(float));

Because "a pointer to a union points to all of its members and vice versa" (C99, I don't remember the exact pargraph, it's around 6.2.5, IIRC). This isn't true in the other direction, though:

float f = 3.14;
union p *punPtr = &f;
send_bytes(punPtr->c, sizeof(float)); // triggers UB!

To sum up: the following code snippet is valid in both C89, C99, C11 and C++:

float f = 3.14;
char *p = (char *)&f;
size_t i;
for (i = 0; i < sizeof f; i++) {
    send_byte(p[i]); // hypotetical function
}

The following is only valid in C99 and later:

union {
    char c[sizeof(float)];
    float f;
} pun;

pun.f = 3.14;
send_bytes(pun.c, sizeof float); // another hypotetical function

The following, however, would not be valid:

float f = 3.14;
unsigned *u = (unsigned *)&f;
printf("%u\n", *u); // undefined behavior triggered!

Another solution that is always guaranteed to work is memcpy(). The memcpy() function does a bytewise copying between two objects. (Don't get me started on it being "slow" -- in most modern compilers and stdlib implementations, it's an intrinsic function).

Community
  • 1
  • 1
  • Can you support *Type punning via copying to and reading from unions is permitted in C99 and later, but results in undefined behavior in [C89 and] all versions of C++.*? – David Rodríguez - dribeas May 01 '14 at 20:08
  • @DavidRodríguez-dribeas Sorry, I don't understand, support how? – The Paramagnetic Croissant May 01 '14 at 20:09
  • I am under the impression that it is legal in C++, but you claim it isn't. Do you have a quote that supports that statement? This is all a grey-ish area, but what is clear is that type punning is supported by all C++ compilers I know, and it is even the *recommended* way of doing this in the gcc manpage. – David Rodríguez - dribeas May 01 '14 at 20:10
  • @DavidRodríguez-dribeas Ah, I see. I am searching the C++11 standard at the moment. – The Paramagnetic Croissant May 01 '14 at 20:15
  • @DavidRodríguez-dribeas Here we go, for example, C++11, 9.2: "If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them." - and later, in 9.5: "In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time." (and here comes the part from above about standard-... – The Paramagnetic Croissant May 01 '14 at 20:21
  • @DavidRodríguez-dribeas ...-layout classes, regarding the inspection of the common initial sequence. The fact that the standard explicitly permits one thing but not another suggests me that the latter is undefined, by omission of definition.) – The Paramagnetic Croissant May 01 '14 at 20:22
  • On the other hand, it guarantees that all of the members of the union are aligned (as if they were the single member of a struct), that in turn guarantees that the array to `[[un]signed] char` will decay to a pointer that has the same address as all of the members in the union *and* is safe to use for reading according to the aliasing rules. Or maybe that reasoning is wrong… – David Rodríguez - dribeas May 01 '14 at 21:01
  • @DavidRodríguez-dribeas I don't doubt that the *alignment* is correct. Other things may not be, though (how about a trap representation, for example?) – The Paramagnetic Croissant May 01 '14 at 21:04
  • Well, if the alignment is guaranteed, the address of the `char*` is the address of the object, and you can legally use that pointer to read the object, can you not? – David Rodríguez - dribeas May 01 '14 at 21:05
  • @DavidRodríguez-dribeas Yes, you can, but before the pointer decay, you still access an inactive member of the union, do you not? – The Paramagnetic Croissant May 01 '14 at 21:08
  • I am not sure you can call the *decay* an *access*… honestly, this is one of the parts of the C++ specification that I find more confusing and I do not know the answer (I know gcc supports this and is specifically documented, but this does not mean that the standard does, and I have not managed to nail the specification either way the couple of times I tried) – David Rodríguez - dribeas May 01 '14 at 21:18
  • @DavidRodríguez-dribeas Hm, interesting... well, yes, GCC permits it, but in my interpretation, the standard does not (and I better err on the side of being too safe than leading someone into relying or undefined behavior...) – The Paramagnetic Croissant May 01 '14 at 21:20
  • user3477950: Thank you for a very detailed answer. There is a jungle out there! Regarding performance, I forgot to mention that I'm programming an embedded 8-bit system so juggling a lot floats around takes some time. – user3050215 May 05 '14 at 15:38
  • @user3050215 You're welcome. Especially if you are programming an embedded system, you better leave micro-optimizations to the compiler. They are way better in fine-tuning code than most programmers these days :) – The Paramagnetic Croissant May 05 '14 at 19:03
1

A general advice when sending floating point data on a byte stream would be to use some serialization technology, to ensure that the data format is well defined (and preferably architecture neutral, beware of endianness issues!).

You could use XDR -or perhaps ASN1- which is a binary format (see xdr(3) for more). For C++, see also libs11n

Unless speed or data size is very critical, I would suggest instead a textual format like JSON or perhaps YAML (textual formats are more verbose, but easier to debug and to document). There are several good libraries supporting it (e.g. jsoncpp for C++ or jansson for C).

Notice that serial ports are quite slow (w.r.t. CPU). So the serialization processing time is negligible.

Whatever you do, please document the serialization format (even for an internal project).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
0

The cast to [[un]signed] char [const] * is legal and it won't cause issues when reading, so that is a fine option (that is, after fixing char *sendChar = reinterpret_cast<char*>(&myValue);, and since you are at it, make it const)

Now the next problem comes on the other side, when reading, as you cannot safely use the same approach for reading. In general, the cost of copying the variables is much less than the cost of sending over the UART, so I would just use the union when reading out of the serial.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489