2

I've encountered a similiar problem as described in another thread (perf_event_open - how to monitoring multiple events). I was able to solve it and the code is working, but I want to understand why this part actually works and how this is not a violation of any kind:

char buf[4096];
struct read_format* rf = (struct read_format*) buf;

struct read_format is defined as followed:

struct read_format {
uint64_t nr;
struct {
    uint64_t value;
    uint64_t id;
} values[/*2*/]; };

How does the compiler know to which value uint64_t nr should be initialized? Or how to initialize the inner struct right?

yxcv42
  • 23
  • 2
  • 1
    It doesn't. The field *nr* is initialized to whatever happens to be in *buf*. – August Karlstrom Jan 24 '21 at 19:17
  • so just for clarification: the first 64 bits will go to the `nr` field, the next 64 bits will be the first entry of `values[]`, to be exact `values[0].value` and than the bits from 128 to 191 will be the value of `values[0].id` and so on? – yxcv42 Jan 24 '21 at 19:51
  • 2
    @AugustKarlstrom the field `nr` is not initialized, and trying to read it is undefined behaviour due to strict aliasing violation – M.M Jan 24 '21 at 21:43

2 Answers2

0

It doesn't The buffer is zero-initialized and the struct pointer is initialized with a pointer to the buffer.

It looks completely whack; however it really isn't. The read function is going to read as many structures into the buffer as fit.

The outer structure is variable-length. The advance loop looks like this:

    struct read_format *current = rf;
    if (readstructs(..., &current, 4096)) {
        for (;current;current=current->nr?((struct read_format *)((char *)current + current->nr)):NULL) {
        }
    }

These things appear in system-level OS calls to decrease the complexity of copying memory across security boundaries. The read side is easy and well-taught. The writer performs the operations necessary in filling the buffer to ensure this simple reader does not violate any system-level constraints. The code will work despite looking like it violates types left and right because the writer has set it up to work. In particular, the pointer will be aligned.

I've seen a similar method used in old file formats. Unfortunately that only follows the rules of the platform that wrote it (usually something ancient and far more permissive than a modern system) and leads to having to write a byte-at-a-time reader because the host doing the reading doesn't correspond.

Joshua
  • 40,822
  • 8
  • 72
  • 132
  • `(struct read_format *)((char *)current + current->nr)` is [Ta strict aliasing violation](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) and therefore undefined behavior. It can also violate any alignment restrictions on the system and can be undefined behavior for that reason, too. – Andrew Henle Jan 24 '21 at 19:55
  • @AndrewHenle: It does not violate the strict aliasing rule because the writer did the reverse operation when it wrote the buffer out. For the same reason, the result is aligned. – Joshua Jan 24 '21 at 19:57
  • @AndrewHenle: For a live example, see `man 2 readdir`. – Joshua Jan 24 '21 at 19:59
  • 1
    *It does not violate the strict aliasing rule because the writer did the reverse operation when it wrote the buffer out.* That is not true. If the buffer is declared as a `char` buffer, ***it's a `char` buffer*** and reading it as a non-`char` type violates strict aliasing. Just because `readdir` happens to do so does not make it proper. – Andrew Henle Jan 24 '21 at 20:11
  • @AndrewHenle: Then `fread()` cannot exist, but `fread()` does in fact exist. The same construct appears in the definition of `FILE*`. The buffer inside `FILE` is a `char` buffer but `fread()` reads arbitrary structures from it. – Joshua Jan 24 '21 at 20:19
  • 1
    @Joshua Source code that forms part of the implementation doesn't have to comply with the standard . Only the code that is part of the program does. The implementation doesn't even have to be written in C . – M.M Jan 24 '21 at 21:46
  • 1
    also `FILE` on my system doesn't contain any buffer, it contains a pointer to a buffer. (maybe yours is different) – M.M Jan 24 '21 at 21:48
  • @M.M: It still casts back and forth between a pointer to char and a pointer to struct and demands that it be interpreted as sound. – Joshua Jan 25 '21 at 01:10
  • @Joshua There may not be any strict aliasing violations involved there (a char pointer is not a char buffer) . But as mentioned in my earlier comment, it's moot as the implementation code is not subject to the C Standard – M.M Jan 25 '21 at 01:15
0

This code is incorrect in Standard C:

char buf[4096];
read(fd1, buf, 4096);   // Assume error handling, omitted for brevity
struct read_format* rf = (struct read_format*) buf;
printf("%llu\n", rf->nr);

There are two issues -- and these are distinct issues which should not be conflated -- :

  • buf might not be correctly aligned for struct read_format. If it isn't, the behaviour is undefined.
  • Accessing rf->nr violates the strict aliasing rule and the behaviour is undefined. An object with declared type char cannot be read of written by an expression of type . unsigned long long. Note that the converse is not true.

Why does it appear to work? Well, "undefined" does not mean "must explode". It means the C Standard no longer specifies the program's behaviour. This sort of code is somewhat common in real code bases. The major compiler vendors -- for now -- include logic so that this code will behave as "expected", otherwise too many people would complain.

The "expected" behaviour is that accessing *rf should behave as if there exists a struct read_format object at the address, and the bytes of that object are the same as the bytes of buf . Similar to if the two were in a union.

The code could be made compliant with a union:

union
{
    char buf[4096];
    struct read_format rf;
} u;

read(fd1, u.buf, sizeof u.buf);
printf("%llu\n", u.rf->nr);

The strict aliasing rule is "disabled" for union members accessed by name; and this also addresses the alignment problem since the union will be aligned for all members.

It's up to you whether to be compliant, or trust that compilers will continue put practicality ahead of maximal optimization within the constraints permitted by the Standard.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Oh hey. You actually found something. I kept on trying to draw distinctions in form, knowing the root must be correct. I was not willing to say it only works if allocated from the heap. But adding the union to the global does indeed change the meaning. – Joshua Jan 25 '21 at 02:24