Store C structs for multiple platform use - would this approach work?

Question

Compiler: GNU GCC

Application type: console application

Language: C

Platforms: Win7 and Linux Mint

I wrote a program that I want to run under Win7 and Linux. The program writes C structs to a file and I want to be able to create the file under Win7 and read it back in Linux and vice versa.

By now, I have learned that writing complete structs with fwrite() will give almost 100% assurance that it won't be read back correctly by the other platform. This due to padding and maybe other causes.

I defined all structs myself and they (now, after my previous question on this forum) all have members of type int32_t, int64_t and char. I am thinking about writing a WriteStructname() function for each struct that will write the individual members as int32_t, int64_t and char to the outputfile. Likewise, a ReadStructname() function to read the individual struct members from the file and copy them to an empty struct again.

Would this approach work? I prefer to have maximum control over my sourcecode, so I'm not looking for libraries or other dependencies to achieve this unless I really have to.

Thanks for reading

Convert struct to standard XML or JSON for ultimate portability... just a thought. — Chimera, Jan 11 '16 at 20:42

score 1 · Answer 1 · answered Jan 11 '16 at 18:41

1

Element-wise writing of data to a file is your best approach, since structs will differ due to alignment and packing differences between compilers.

However, even with the approach you're planning on using, there are still potential pitfalls, such as different endianness between systems, or different encoding schemes (ie: two's complement versus one's complement encoding of signed numbers).

If you're going to do this, you should consider something like a JSON parser to encode and decode your data so you don't corrupt it due to the issues mentioned above.

Good luck!

answered Jan 11 '16 at 18:41

Cloud

18,753
15
79
153

While you are right in general, gcc only supports 2s complement. And the conversion from signed to unsigned is we--defined. OP clearly states he does not want an external library. – too honest for this site Jan 11 '16 at 18:50
@Olaf True, OP did ask for no external libraries, but the alternative is writing a parser from scratch. If so, OP would be best off just storing the info as ASCII encoded text strings. A lot of questions on SO seem to follow this format `I need feature XYZ, how do I do it without any external libraries`, and the end result is either, `just use a library`, or `be prepared for a lot of work while you re-invent the wheel`. Even if OP wants to write a parser, the info provided above should help him/her along the way. – Cloud Jan 11 '16 at 23:52
There is no need to use a text format. A binary format will be as portable if defined and implemented properly. See @dbush s answer. – too honest for this site Jan 12 '16 at 08:38

Felipe Lavratti · Answer 2 · 2016-01-12T16:27:45.297

If you use GCC or any other compiler that supports "packed" structs, as long you avoid yourself from using anything but [u]intX_t types in the struct, and execute endianness fix in any field where type is bigger than 8 bits, you are platform safe :)

This is an example code where you get portability between platforms, do not forget to manually edit the endianness UIP_BYTE_ORDER.

#include <stdint.h>
#include <stdio.h>

/* These macro are set manually, you should use some automated detection methodology */
#define UIP_BIG_ENDIAN 1
#define UIP_LITTLE_ENDIAN 2
#define UIP_BYTE_ORDER UIP_LITTLE_ENDIAN

/* Borrowed from uIP */
#ifndef UIP_HTONS
#   if UIP_BYTE_ORDER == UIP_BIG_ENDIAN
#      define UIP_HTONS(n) (n)
#      define UIP_HTONL(n) (n)
#      define UIP_HTONLL(n) (n)
#   else /* UIP_BYTE_ORDER == UIP_BIG_ENDIAN */
#      define UIP_HTONS(n) (uint16_t)((((uint16_t) (n)) << 8) | (((uint16_t) (n)) >> 8))
#      define UIP_HTONL(n) (((uint32_t)UIP_HTONS(n) << 16) | UIP_HTONS((uint32_t)(n) >> 16))
#      define UIP_HTONLL(n) (((uint64_t)UIP_HTONL(n) << 32) | UIP_HTONL((uint64_t)(n) >> 32))
#   endif /* UIP_BYTE_ORDER == UIP_BIG_ENDIAN */
#else
#error "UIP_HTONS already defined!"
#endif /* UIP_HTONS */


struct __attribute__((__packed__)) s_test
{
    uint32_t a;
    uint8_t b;
    uint64_t c;
    uint16_t d;
    int8_t string[13];
};

struct s_test my_data =
{
    .a = 0xABCDEF09,
    .b = 0xFF,
    .c = 0xDEADBEEFDEADBEEF,
    .d = 0x9876,
    .string = "bla bla bla"
};

void save()
{
    FILE * f;
    f = fopen("test.bin", "w+");

    /* Fix endianness */
    my_data.a = UIP_HTONL(my_data.a);
    my_data.c = UIP_HTONLL(my_data.c);
    my_data.d = UIP_HTONS(my_data.d);

    fwrite(&my_data, sizeof(my_data), 1, f);
    fclose(f);
}

void read()
{
    FILE * f;
    f = fopen("test.bin", "r");
    fread(&my_data, sizeof(my_data), 1, f);
    fclose(f);

    /* Fix endianness */
    my_data.a = UIP_HTONL(my_data.a);
    my_data.c = UIP_HTONLL(my_data.c);
    my_data.d = UIP_HTONS(my_data.d);
}

int main(int argc, char ** argv)
{
    save();
    return 0;
}

Thats the saved file dump:

fanl@fanl-ultrabook:~/workspace-tmp/test3$ hexdump -v -C test.bin 
00000000  ab cd ef 09 ff de ad be  ef de ad be ef 98 76 62  |..............vb|
00000010  6c 61 20 62 6c 61 20 62  6c 61 00 00              |la bla bla..|
0000001c

Packed `struct`s are non-standard. Proper serialisation is the more portable approach. — too honest for this site, Jan 11 '16 at 21:52
@fanl, I'm trying to understand this code. Shouldn't the comment line # else /* UIP_BYTE_ORDER == UIP_BIG_ENDIAN */ read ... == UIP_LITTLE_ENDIAN ? — Marnix, Jan 11 '16 at 21:56
@Marnix: The comment is not a code that should run, it is a practice where every `#else` and `#endif` should have a comment telling to witch `#if` it regards to. It is particularly useful if you have nested conditional macros. — Felipe Lavratti, Jan 11 '16 at 22:45
@Felipe, thank you for the clarification. I thought the comment referred to what was coming in the ELSE statement. I have 2 questions regarding your solution. 1. In the read() funtction, shouldn't the /* fix endiannes */ come after the fread() statement rather than before? 2. I found some examples for determining the endianness of the platform but they are all determined at runtime (dereference an int 1 to char and check for 0 or 1 value). This runtime approach wouldn't work with the macro, right? — Marnix, Jan 12 '16 at 10:36
@Marnix: You are right, endianness fix should be done after reading. I'll edit the answer to fix it. And yes, once you have a runtime way of detecting it you will need a run-time function to fix endianness. The proposed MACROS are defined in build time. There is a non-standard way of knowing the endianness in build time, depending on your toolchain this can work: Search for endian in this link https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html — Felipe Lavratti, Jan 12 '16 at 16:25

dbush · Accepted Answer · 2016-01-11T18:51:51.563

0

This is a good approach. If all fields are integer types of a specific size such as int32_t, int64_t, or char, and you read/write the appropriate number of them to/from arrays, you should be fine.

The one thing you need to watch out for is endianness. Any integer type should be written in a known byte order and read back in the proper byte order for the system in question. The simplest way to do this is with the ntohs and htons functions for 16-bit ints and the ntohl and htonl functions for 32-bit ints. There's no corresponding standard functions for 64-bit ints, but that shouldn't be to difficult to write.

Here's a sample of how you could write these functions for 64 bit:

uint64_t htonll(uint64_t val)
{
    uint8_t v[8];
    uint64_t *result = (uint64_t *)v;
    int i;

    for (i=0; i<8; i++) {
        v[i] = (uint8_t)(val >> ((7-i) * 8));
    }

    return *result;
}

uint64_t ntohll(uint64_t val)
{
    uint8_t *v = (uint8_t *)&val;
    uint64_t result = 0;
    int i;

    for (i=0; i<8; i++) {
        result |= (uint64_t)v[i] << ((7-i) * 8);
    }

    return result;
}

edited Jan 11 '16 at 18:51

answered Jan 11 '16 at 18:43

dbush

205,898
23
218
273

I'd leave loop-unrolling to the compiler here. Your code will generate warnings if truncation errors are enabled. For the serialisation, instead of masking, cast to `uint8_t`. For the deserialisation, the shifts wil not work as expected, because `v[i]` will be promoted to `int`. Thus for 32 bit `int` e.g. the shifts >=32 invoke undefined behaviour, the 24-position shift for some values, too. But the idea as such is correct. – too honest for this site Jan 11 '16 at 18:45
For `ntohll` I'd pass a `const char [8]`; not sure why you cast here. The `uint64_t` creates problems and is useless actually. With the loops, you can make both functions generic (pass the value as `uint64_t`, but allow to convert 1..8 octets), actually. – too honest for this site Jan 11 '16 at 18:56
@Olaf, you mean the ntohll() header should be "uint64_t ntohll(const char[8])" ? – Marnix Jan 11 '16 at 21:43
@Marnix: Yes. Just pass the part of the buffer to the function (you don't need the first line then, of course). And adding another argument with the size and using as loop-max. allows to process all widths with just a single function per direction. – too honest for this site Jan 11 '16 at 21:49
I looked into this Endian thing a bit further. This may be a stupid question, but I'll try anyway. This Big/Little Endian seems to come down to reversing the byte order in integers. What if I just write a function to swap the byte order (0<->7, 1<->6, etc). So if my write program detects it has the 'wrong' Endian format it will simply swap all integers before writing them to the file. Likewise, if the reading program detects it's on the wrong Endian platform it will swap all integers after reading them from the file. If the platform has the right Endian they do nothing. Am I missing something? – Marnix Jan 12 '16 at 20:29
@Marnix The trick is knowing whether you are big-endian or little-endian. Each operating system/platform has its own way of designating this in the system header files. The functions I gave above are portable. `htonll` takes a 64-bit number in the local byte order and changes it to network byte order (i.e. big-endian). Similarly, `ntohll` takes a 64-bit number in network byte order and changes it to the host's byte order. How it does this doesn't depend on what the host byte order is. – dbush Jan 12 '16 at 20:34

Store C structs for multiple platform use - would this approach work?

3 Answers3