3

I previously posted a question here about aligned access during pointer casting. As a summary, it's better not to use unaligned access to be fully portable because some architecture may throw an exception, or the performance may get quite slower compared to aligned access.

However, there are cases that I want to use one-byte alignment, e.g., during transferring network data, I don't want adding extra padding inside structure. So usually what's done here is:

#pragma pack (push, 1)
struct tTelegram
{
   u8 cmd;
   u8 index;
   u16 addr1_16;
   u16 addr2_16;
   u8  length_low;
   u8 data[1];
};
#pragma pack (pop)

Then you might already know my question: If I enforce one-byte alignment on my struct, does that mean it cannot be fully portable, because struct members are not aligned? What if I want both no padding and portability?

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
Eric Z
  • 14,327
  • 7
  • 45
  • 69
  • Bit fields are not portable. The order in which `length_high`, `reserved`, and `next` get packed is **definitely** implementation-dependent and different compilers **will** do it differently. – Dietrich Epp Dec 17 '12 at 00:06
  • oh yes! I just copied it from a legacy code. Just forget it now, I'll remove it to focus on the problem – Eric Z Dec 17 '12 at 00:07
  • Some good information at http://stackoverflow.com/questions/7793511/are-there-performance-issues-when-using-pragma-pack1 – Tony Delroy Dec 17 '12 at 00:54

2 Answers2

3

Firstly, misaligned memory accesses refers to single pieces of data that span multiple words in memory. For example: on a 32-bit system a 32-bit int at address 0, 4, 8 etc is aligned, but at 1, 2, 3, 5, 6, 7, 9 etc. would be misaligned.

Secondly, misaligned data doesn't "throw an exception" in the C++ sense, but may raise an interrupt/trap/exception at the CPU level - e.g. SIGBUS on UNIX, where you'd generally set a signal handler to react to this, but if you need to parse misaligned data in a portable way you wouldn't do so by catching signals - you'd manually code the steps to pack and unpack data spanning word boundaries.

In your tTelegram struct, the data is not "misaligned", but the process of bit shifting and masking the data as it's packed/unpacked from a register is still likely slower - requiring more machine code instructions - than using data that occupies an independent word.

Regarding portability - all non-toy compilers will have an option to pack in the way you've described, but the exact pragma will vary, the layout of bytes in multi-byte values may still be big-endian or little-endian (or something plain weird), and while some CPUs allow some misaligned data access (e.g. x86) others don't (e.g. Ultrasparc).

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • For CPUs don't allow misaligned data access, most compilers for that architecture will "cheat" CPU by "disassemble" the data by accessing aligned memory address several times, and then somehow get them OR-ed. Is that right? – Eric Z Dec 17 '12 at 01:14
  • @EricZ: consider a tightly packed `struct X { char x; int y; };`. Assuming an `X` is itself aligned on a word boundary (the compiler tends to take care of that), and `int` is of word size, then the `y` data member must be misaligned. The compiler can see that and should generate bitshifting and AND (to read) / OR (to set) the value of `y`. Compilers don't tend to guard against misaligned data access caused from unaligned pointers such as `*(int*)(1)`; in that case CPUs like the x86 would "fudge it" (probably more slowly than normal `int` access) while others like Ultrasparcs would trap. – Tony Delroy Dec 17 '12 at 01:40
2

When transferring data between different computers you always want to format you data. Note, that a data format doesn't have to be readable but it can very well be binary. A binary format would included the exact position of each data item, its type, for multi-byte data the order the bytes appear, the size or a way to determine the size, etc. Not using a defined format will bite, probably sooner than later.

Put differently, although I have seen approaches as you describe used, I don't think they are normal and they are certainly not normal when it comes to defined format between different entities (companies for sure, probably also between different departments and/or groups). In the places where I worked for receiving and sending data the exact format was certainly defined. If the defined format can be matched with the data layout in a struct it is certainly also used to decode the data but it is known not to be portable and code meant to be portable doesn't attempt to use facilities like this. Instead it uses something which read/writes the relevant records and decodes/encodes the different appropriately. Often the decoding/encoding code is generated from some sort of meta format describing the exact data layout.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • the layout is defined well by some industrial protocol so I have nothing to do about it. Your point is using one-byte alignment is not porable? – Eric Z Dec 17 '12 at 00:34
  • Indeed, byte alignment it is not portable. Not all compilers support an option to force byte alignment. Nor is the order of bytes within a word always the same (Big Endian vs. Little Endian). – Dietmar Kühl Dec 17 '12 at 00:47
  • +1 @EricZ among other things, no it isn't portable. The packing and alignment are entirely up to the platform, and even if your platform "lets" you specify conditional compilation therein, by definition it isn't portable. The only guaranteed way to make it portable is via a *protocol* definition that *both* sides follow **strictly**, literally down to the single octet. When it runs on big-endian, little-endian, harsh alignment environments (SPARC), etc., and *still* works, you might* be on to something. – WhozCraig Dec 17 '12 at 00:49