1

I'm fixing some code not written by me, so I found this:

#define get_u_int16_t(X,O)  (*(u_int16_t *)(((u_int8_t *)X) + O))

How can I change it to keep the rule, if it violating it ?

The macro is called in this way:

if(get_u_int16_t(packet->payload, i) == ...) { ... }

where payload is a const unsigned char * and i is an unsigned int .

The situation is:

struct orig {
   [...]
   struct pkt packet;
}*;

struct pkt {
   [...]
   const u_int8_t *payload;
 }*;

Called in this way:

struct orig * flow;
struct pkt * packet = &flow->packet;

payload is a string

i begins with a value of 0 and it is inside a for that loop for the lenght of payload ( u_int16_t len ):

for(i = 0; i < len; i++) {
   if(get_u_int16_t(packet->payload, a) == /*value*/) {
   // do stuff
}
Kyrol
  • 3,475
  • 7
  • 34
  • 46
  • It is impossible to determine that from the code you have shown. Present a SSCCE. – 2501 Aug 17 '16 at 10:42
  • Whether this violates the strict aliasing rule depends on the type of the argument passed for `X`. If the argument is of type `u_int16_t*` and points to an object with effective type `u_int16_t`, and its value can be converted to `u_int8_t*` (`u_int8_t` has no stricter alignment requirements than `u_int16_t`), the behavior is well-defined. – EOF Aug 17 '16 at 10:42
  • I also asked here, so this is why I open this question, just wanna think better https://plus.google.com/u/0/+MicheleCampus/posts/YjUFgDQbDy8?cfem=1 – Kyrol Aug 17 '16 at 10:44
  • I'll modify adding an example of a piece of code. – Kyrol Aug 17 '16 at 10:46
  • 1
    Even with your recent addition, there's still not enough to provide a definitive answer. The type of `packet->payload` is `const unsigned char *`, sure, but what type of object is it actually pointing to? (forgetting for now the whole pointer conversion problem, since most compilers treat a pointer type conversion as returning a pointer to the same object). More specifically: what is the _effective type_ (as per 6.5p7) of the object pointed at by `packet->payload`? – davmac Aug 17 '16 at 12:44
  • You didn't tell us what types are uint8_t and uint16_t. What are the values of i? What is the element count of the payload array? Where is it pointing to. The example is incomplete. – 2501 Aug 17 '16 at 13:19
  • @2501 I'm edited the question. I inherited this code and I have to solve many bugs, so I also try to understand better. There are many confusional parts. – Kyrol Aug 17 '16 at 13:39
  • @Kyrol My comment refers to the question after the edit. – 2501 Aug 17 '16 at 13:41

2 Answers2

2

The macro itself doesn't violate the strict aliasing rule; it depends how you use it. If you only use it to read already existing objects of type u_int16_t or a compatible type then it's fine; if on the other hand you use it to read e.g. parts of a 64-bit integer or a floating-point object then that would be a strict aliasing violation, as well as (possibly) an alignment violation.

As always, you can make the code safe using memcpy:

inline u_int16_t read_u_int16_t(const void *p) {
    u_int16_t val;
    memcpy(&val, p, sizeof(val));
    return val;
}
#define get_u_int16_t(X,O)  (read_u_int16_t(((u_int8_t *)X) + O))

As @2501 points out this may be invalid if u_int8_t is not a character type, so you should just use char * for pointer arithmetic:

#define get_u_int16_t(X,O)  (read_u_int16_t(((char *)X) + O))
ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • I'm going to modify the question putting some code on how the macro is used. – Kyrol Aug 17 '16 at 10:51
  • 1
    Since we don't know if u_int8_t is a character type, the behavior may be undefined because of conversion between pointers to incompatible types. – 2501 Aug 17 '16 at 10:53
  • @2501 I wonder if it is possible for an 8-bit fixed-width integer to have stricter alignment restrictions than `char`. Obviously we can't rule out the possibility for non-standard types like the ones in the question, but if it were `uint8_t` I'm not sure it *can* have stricter alignment restrictions than `char`. – EOF Aug 17 '16 at 10:58
  • 1
    @EOF alignment is irrelevant; if `u_int8_t` is not a character type (`char`, `signed char` or `unsigned char` then the cast is illegal if `X` is not a compatible pointer. True, there are currently no 8-bit integers that are not character types but there is an extant proposal for an 8-bit non-character type to represent UTF-8 data, and it could be something crazy like an enum. – ecatmur Aug 17 '16 at 11:13
  • 2
    @ecatmur: C11 draft standard n1570: *6.3 Conversions 6.3.2.3 Pointers 7 A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.* – EOF Aug 17 '16 at 11:14
  • @EOF good point, but the remainder of that paragraph says that you can only perform pointer arithmetic (`+ O`) if the pointer is to a character type. – ecatmur Aug 17 '16 at 11:18
  • 1
    I don't know what "u_int8" is, but a compiler which doesn't treat `uint8_t` as a character type is useless for real-world applications - the C standard be damned. I very much doubt that such a useless compiler exists in the real world. – Lundin Aug 17 '16 at 11:19
  • It seems EOF is correct, and only alignment matters and not compatibility. – 2501 Aug 17 '16 at 11:20
  • @ecatmur I'm not sure if that last sentence applies to the entire paragraph or just the last sentence. – 2501 Aug 17 '16 at 11:20
  • @2501 I'm not actually quite so sure. If we use ecatmur's approach (which converts the pointer to `void*`), the pointer is not round-tripped through a different pointer type. The "*Otherwise, when converted **back** [...]*" may not be a relevant guarantee. – EOF Aug 17 '16 at 11:23
  • 1
    @2501, no, you're right in the first place, though the reasoning is difficult. The paragraph quoted by EOF doesn't say what the result of the conversion in one direction only is, that is to say, converting some pointer to another pointer type is generally allowed, but there's no stipulations about what it points to (if anything). All that is required is that it points to the original object _when converted back_ to the original type. This macro if used potentially dereferences a pointer that points at an unknown location. – davmac Aug 17 '16 at 11:24
  • @davmac Yes I agree. Only my reason is not correct. I should have said different type not incompatible type. This is because compatible types may be different. – 2501 Aug 17 '16 at 11:28
  • @davmac: So, I've already conceded that the standard doesn't guarantee this, but it *does* beg another question. Is it even logically *possible* to go through a sequence of reversible steps that end up in the same "state" (type) but with a different value? Is type not equivalent to a thermodynamic state function? – EOF Aug 17 '16 at 11:43
  • @EOF I think a strict interpretation of the standard would suggest that this could be possible, yes. In practice of course many (most? nearly all?) compilers implement pointer conversion as no operation at the machine level, and treat the result as a pointer to the same object but with different type. – davmac Aug 17 '16 at 12:04
  • @davmac: I'm asking about the *logical* possibility. Even if pointer conversions *do* cause changes to the bitwise representation, I don't see how after `Ta* a = foo(); Tb* b = (Tb*)a; Ta* c = (Ta*)b;` guarantees `a == c` without also guaranteeing that after `Tb *d = (Tb*)c`, `b == d`. – EOF Aug 17 '16 at 12:17
  • @EOF in your latter example `b == d` is guaranteed by the original rule - you take a pointer `b`, convert it to another type and back, so (assuming alignment constraints are not violated) you must have a value that compares equal with `b`. The wording of your comment prior lead me to believe you were talking about a case of (in terms of types) A -> B -> C -> A, for which there is no such guarantee. I don't see any logical problem with the value being different after such a sequence of conversions, though the compiler might have to be deliberately designed to have hostile semantics. – davmac Aug 17 '16 at 12:21
  • I edited the question to be more clear. Thanks to all for discussion – Kyrol Aug 17 '16 at 12:37
  • 1
    Text *"If you only use it to read objects of type u_int16_t or a compatible type then it's fine;*" is misleading. Code `get_u_int16_t(uint16array, 1);` will crash and burn on any platform that does not allow unaligned access. – user694733 Aug 17 '16 at 12:50
  • @user694733 there is no such object as `uint1‌​6array[0.5]` - I'll try to improve my wording though. – ecatmur Aug 17 '16 at 13:06
  • 1
    I still don't like the implication that original macro is somehow fine. Only time when macro doesn't break horribly is when data type of first argument is matching and offset is correctly calculated, for example `get_u_int16_t(uint1‌​6array, 2);`. But then you could just use `uint1‌​6array[1]`, but that defeats the original point which is to have deserialization function which can extract 16-bit integer from any byte array. Let's face it; original macro is simply garbage and has no sensible usage. – user694733 Aug 17 '16 at 13:20
  • @user694733: The macro lacks parentheses around its arguments, which is bad given some of the likely ways it might be used, but otherwise it would be fine when targeting dialects of C whose authors recognize that *quality* compilers intended for embedded or systems programming must be capable of recognizing constructs beyond those mandated by the Standard. – supercat Aug 19 '16 at 14:45
0

There are two ways to write code of the type you're interested in:

  1. Use pointers of type "unsigned char*" to access anything, assemble values out of multiple bytes, and tolerate the performance hit.

  2. If you know that pointer alignment won't be an issue, use a dialect of C that doesn't semantically gut it. GCC can implement such a dialect if invoked with -fno-strict-aliasing, but since gcc has no way to prevent obtuse "optimizations" without blocking useful and sensible optimizations, getting good performance from such a dialect may require learning how to use restrict.

Some people would argue that approach #1 is better, and for many PC-side programming purposes I'd tend to agree. For embedded software, however, I would suggest staying away from gcc's default dialect since there's no guarantee that the things gcc regards as defined behavior today will remain so tomorrow. I'm not sure where the authors of gcc got the notion that the authors of the Standard were trying to specify all of the forms of aliasing that a quality microprocessor implementation should support, rather than establish a minimum baseline for platforms where no forms of type punning other than "unsigned char" would be useful.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Thanks for the comment. So in your opinion, the answer in correct and I have to use char * instead u_int8_t as suggest ? – Kyrol Aug 19 '16 at 11:15
  • @Kyrol: Under a strict reading of the rules, there is no guarantee that `uint8_t*` will be recognized as having the same aliasing guarantees as `unsigned char*`. I think the more important point, though, is the importance of selecting a C dialect. If one is only targeting implementations with known storage layout rules, the advantages a programmer can obtain by exploiting such rules (with aliasing rules disabled) can often exceed the advantages a compiler could obtain via aliasing rules and couldn't obtain via `restrict`. – supercat Aug 19 '16 at 14:30
  • @Kyrol: Note also that on many embedded platforms, `memcpy` is absolutely dreadful for small moves. Assembling things out of `unsigned char` will often be more than 2x as slow as using `uint16_t*`; the speed penalty for `memcpy` can be closer to 10x. – supercat Aug 19 '16 at 14:33
  • Wow, I didn't know about that issue of `memcpy`. Actually I think pointers alignment could be an issue: i'm reading this http://stackoverflow.com/questions/227897/how-to-allocate-aligned-memory-only-using-the-standard-library . – Kyrol Aug 19 '16 at 23:00
  • I don't understand instead how could be usefull the `restrict`, because my pointers point to the same location. Thanks for clear me these doubts – Kyrol Aug 19 '16 at 23:08
  • @Kyrol: Efficient performance requires that a compiler be able to optimize cases where pointers won't need to alias. Places where pointers will alias can't use `restrict`, but using `restrict` in the places where it's safe will avoid the speed penalty associated with disabling type-based aliasing. – supercat Aug 22 '16 at 08:42