Reading struct from mmap

Question

typedef struct aaa {
  int a;
  int b;
  long ptr_to_st2; //offset from the beginning of the file.
} st1;

typedef struct bbb {
  int get;
  char it;
} st2;

I have a binary file mapped to memory using mmap. The file contains st1 at the beginning of the file and then some data and then st2.

unsigned char *filemap; //mmap
st1 *first=(st1 *)filemap;
st2 *second=(st2 *)filemap+first->ptr_to_st2;
printf("%c",second->it);

I've been told this code is incorrect and violates strict aliasing rule. What is the correct way to write this code? Thanks.

[Perhaps reading **this** will help](http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule). — WhozCraig, Jul 24 '15 at 03:48
Is your `ptr_to_st2` is actually pointing to `st2`? Additionally, (and perhaps not necessarily) `ptr_to_st2` should be of the type `st2` — WedaPashi, Jul 24 '15 at 03:51

score 1 · Accepted Answer · edited May 23 '17 at 10:26

To put it simply, int has an alignment requirement. Supposing sizeof (int) is two on your machine, and we look at your memory as a sequence of blocks:

[a][a][b][b][c][c][d][d]...

We can store an int in the [a] blocks, the [b] blocks and so on... Basically at every second address... but not between them.

On our common household machines, we may in fact be able to store them in between, but this comes at a performance cost; the bus is still aligned to retrieve integers that satisfy the alignment requirement, so there'll be two retrievals via the bus for every one misaligned integer. That is undesired.

On uncommon household machines (such as old Apples, or even those things we don't commonly program for, such as just about every router on the planet) such a misaligned access will cause a condition similar to a segfault, known as a bus error. That is definitely undesired!

If you serialise and deserialise your information properly (as opposed to just using typecasts to reinterpret parts of the array), you won't see any of these problems. That is, translate your structures byte by byte, for example:

void serialise_st1(void *destination, st1 *source) {
    unsigned char *d = destination;
    unsigned long  s = (unsigned int) source->a;

    d[0] = s >> 8;
    d[1] = s;

    s = (unsigned int) source->b;
    d[2] = s >> 8;
    d[3] = s;

    s = source->ptr_to_st2;
    d[4] = s >> 24;
    d[5] = s >> 16;
    d[6] = s >> 8;
    d[7] = s;
}

Notice how I translated into every byte, manually? The deserialisation process is a little tougher due to the need to handle the sign, but it is essentially the reverse: Rather than assigning to each byte individually, we access each byte individually.

void deserialise_st1(st1 *destination, void *source) {
    unsigned char *s = source;
    *destination = (st1) { .a = (s[0] <= 127 ? s[0] : -(256 - s[0])) * 0x0100
                              +  s[1],
                           .b = (s[2] <= 127 ? s[2] : -(256 - s[2])) * 0x0100
                              +  s[3],
                           .ptr_to_st2 = (s[4] <= 127 ? s[4] : -(256 - s[4])) * 0x01000000
                                       +  s[5] * 0x00010000
                                       +  s[6] * 0x00000100
                                       +  s[7] };
}

Then, adapting upon your example:

unsigned char *filemap;
st1 first;
deserialise_st1(&first, filemap);

I'll leave it as an exercise for you to write deserialise_st2, but feel free to ask if you have any problems doing so.

st2 second;
deserialise_st2(&second, filemap + st1.ptr_to_st2);

Assuming your code goes on to update first or second, and you want to push those updates into your filemap, you would need to know the offset that it came from... That is, you'll want to assosciate filemap as the pointer to first (first_ptr), and filemap + st1.ptr_to_st2 as the pointer to second (second_ptr)... Then:

serialise_st1(first_ptr, &st1);
serialise_st2(second_ptr, &st2);

Thanks for your helpful reply, I have couple of questions, for every CPU I need to write different serialize function according to the size of the types of the CPU, is that correct? can you please explain why you cast `source->a` to `unsigned int` and shift right by one byte? `unsigned long s = (unsigned int) source->a; d[0] = s >> 8;` — Neet33, Jul 24 '15 at 12:40
@Neet33 No. You should choose the integer based on the range that is required in standard C, and serialise for that. That is, if you're going to use values within the range -32767 to 32767 then use `int`... Otherwise, if you intend to use values within the range -2147483647 to 2147483647, use `long`. Otherwise, consider using `long long`. — autistic, Jul 24 '15 at 13:13
@Neet33 I converted the signed value to an unsigned value to encode the sign bit for negative values; that's the cleanest way I could think of doing that. The left shift extracts the high byte of the 16-bit value. — autistic, Jul 24 '15 at 13:15
I have another question I hope you can help me, If I know which system is going to run the program, it there another, better way I can write this? maybe reading straight from memory? (like my example in the question just without violating strict aliasing rule.) — Neet33, Jul 30 '15 at 23:38
Hi, I need help seializing this `typedef struct { union { char a[8]; struct { unsigned long z; unsigned long o; } ss; } ss; } st3;` Can you help me? Thanks. — Neet33, Aug 01 '15 at 01:26
@Neet33 You need to put a tag in the outer struct so that you know which union member to use... Once you've done that, you should be able to think about this as serialising the tag, then serialising the member that corresponds to the tag.... — autistic, Aug 03 '15 at 12:33

Reading struct from mmap

1 Answers1