Why Data alignment of a 2 byte variable at a 2's multiple location in 4 byte compiler (32 bits) needed?

Question

I'm learning Cpp from the basics, and when learning structure padding and data alignment, confused with the following:

struct A {
char a;
double b;
char c;
};

struct B
{
char a;
short b;
char c;
};

Using MSVC++ 14.0 32 bit compiler

I understood the concept of data alignment by referring various pages on web.

In the 1st case, the sizeof(A) will give "24 (1 (a)+7(padding)+8(b)+1(c)+7(struct padding - since double is the max sized variable, padding is used to that value (8-1))". Here, I understood that because of cache lines (usually 32/64 bytes), double of 8 byte size is started at a location, which is a multiple of 8 instead of a multiple of 4 (even though compiler is 4 byte long - either case it requires 2 memory cycles but to avoid the data being split across 2 cache line location which is multiple of 8 is chosen)

But in the 2nd case, the sizeof(B) yields "6 (1(a) + 1(padding) + 2(b) + 1(c) + 1(padding)". This is of course the expected value from the rule (value of x bytes starts at a location which is multiple of x). But here, I did not understand why there is a need for b to start at a location which is multiple of 2?

Assuming that in struct B, a is placed at 0x00, why can't b placed at 0x01 and c at 0x03? Even in this case, the value of any of the variables (a,b or c) can be fetched in a single memory cycle (since 4 byte compiler) and this also doesn't result in 2 cache lines. I didn't understand the use of padding in this case.

Please help!!! I'm just curious to know, what advantage padding adds here?

@πάνταῥεῖ I do not think that dupe is right. The OP knows about padding. They are asking why `short` needs to be aligned to an even address. — NathanOliver, Jun 19 '17 at 18:20
@Nathan The answers in the dupe also explain why, and what the advantages are. — πάντα ῥεῖ, Jun 19 '17 at 18:20
@πάνταῥεῖ, I have gone through the thread but couldn't get an exact answer. Can you please explain/elaborate it. — infinite loop, Jun 19 '17 at 18:27
@infiniteloop What specifically didn't you understand from the answers (and inked material) there? Can you add that into your question please, I might consider reopening it then. — πάντα ῥεῖ, Jun 19 '17 at 18:30
@πάνταῥεῖ, I didn't understand anything about buses. I couldn't find a case which explains what I asked. I'm just trying to be good at C and Cpp. So started from basics. I would be grateful if you can explain my case. That confusion is still struck in my head. — infinite loop, Jun 19 '17 at 18:35
@infiniteloop It is about how the CPU is connected to the memory chips or it's internal cache and memory is connected to the CPU. These want natural addresses as multiples of 32 or 64 (depending on target architecture) to work (fetch/store) efficiently with the data (as the linked [wikipedia article](https://en.wikipedia.org/wiki/Data_structure_alignment) also explains well). — πάντα ῥεῖ, Jun 19 '17 at 18:42
@πάνταῥεῖ I don't think that explains anything. Cache lines are automatically aligned, so memory never has to deal with unaligned accesses. Unaligned accesses to cache are not inherently a problem and certainly allowed by x86 (it's not even slow, except in special cases), the platform OP is compiling for. If that whole thing was aligned by 4 and the short is aligned, nothing bad would ever happen. — harold, Jun 19 '17 at 18:46
@πάντα ῥεῖ , Re "*The answers in the dupe also explain why*", They say why padding is used, but I don't see any that answer's the OP's question: Why is the short aligned to even positions. — ikegami, Jun 19 '17 at 18:46
"cpp" is the C preprocessor, C++ is the language you use and C is a different, unrelated language. And your question is not really about programming, but digital hardware in general. The dup's accepted answer answers your question, too. If you don't understand it, you might lack the necessary basics to understand it. Not your fault, just go on learning. Said that: we are not a teaching site. — too honest for this site, Jun 19 '17 at 18:47
@infinite loop, You say you expect value of x bytes to start at a location which is multiple of x, but then you ask why a 2-byte short starts at a multiple of 2? I don't get it. — ikegami, Jun 19 '17 at 18:48
@ikegami `short` (16 bits) is aligned to the next multiple of 32 as the CPU works efficiently with addresses fonforming to multiples of 32. — πάντα ῥεῖ, Jun 19 '17 at 18:49
If it could fetch the 4 bytes chunk in a single cycle, why can't it access a two bytes within it and discard the other (it does the same i guess when the 4 bytes chunk is of chars) — infinite loop, Jun 19 '17 at 18:49
@πάντα ῥεῖ, Not according to the question. If that's true, there's yet another problem with the question. — ikegami, Jun 19 '17 at 18:50
@infiniteloop _"why can't it access a two bytes within it and discard the other (it does the same i guess when the 4 bytes chunk is of chars)"_ Because the hardware is designed like that. — πάντα ῥεῖ, Jun 19 '17 at 18:50
@πάνταῥεῖ but it isn't. x86 can access an unaligned word (or dword, qword and even oword) just fine. It just shouldn't cross a page boundary, or back on Core2, it shouldn't cross a cache line boundary. — harold, Jun 19 '17 at 18:51
Then how in case of 4 chars in a single 4byte chunk it is possible, it is directly accessing the value from address to byte and not using any masking in my generated assembly... — infinite loop, Jun 19 '17 at 18:53
@infiniteloop Read [here](https://en.wikipedia.org/wiki/Address_bus) and further from the links. aren't you guys andd gals learning about that basic stuff in your IT courses anymore? — πάντα ῥεῖ, Jun 19 '17 at 18:58
@infiniteloop: Pleae provide a reference to the standard requiring a specific alignment (or even size) of data type. Or disallowing the observed behaviour. — too honest for this site, Jun 19 '17 at 18:59
@harold: Read the (very long) instruction timing information for x86 (one of the most complicated for any CPU). YOu might be surprised what unaligned accesses cost. And there's not only the CPU, but also the PCIe root complex, the DRAM controller, caches, etc. Before stating "there will be no problem", you really should check the whole data path. It's really interesting (and takse some months to fully work through - for an expert). — too honest for this site, Jun 19 '17 at 19:01
@Olaf Unaligned accesses are mostly a non-issue since Nehalem, though crossing various boundaries is can cause some smaller issues especially on AMD (not that I care much about AMD tbh). The cost of crossing a cache line boundary is nowadays essentially as if both parts were accessed individually. Some store-load forwarding failure can also occur, usually not a big deal. Crossing a page boundary is still bad, aligning the whole thing by 4 will take care of that. Of course if you have anything that contradicts that I'll be interested. — harold, Jun 19 '17 at 19:12
@harold: Tell this to theHPC people and the embedded people. And then there is this ABI thingy which requires a specific layout of data with external linkage. Anyway, this is not the place nor the time for discussion. But you might have noticed why the question is far too broad! — too honest for this site, Jun 19 '17 at 19:18
@harold Did you ever work with really small targets, or are you a PC mouse clicker? — πάντα ῥεῖ, Jun 19 '17 at 19:23
@Olaf HPC people know that, that's why the "split the load and shuffle bytes" style of unaligned accesses (when they were otherwise unavoidable) isn't used after Core2. Of course most vector loads and stores should still be aligned. — harold, Jun 19 '17 at 19:25
@πάνταῥεῖ surely targets other than what OP specified are off-topic? I used to do z80 before x86. — harold, Jun 19 '17 at 19:28
@harold Oh, Z80 qualifies you being a greybeard like me. No offense, sorry. — πάντα ῥεῖ, Jun 19 '17 at 19:30
@harold: z80 is large compared to typical MCUs. And as an 8-bit CPU with 8 bit Bus, it has only minor alignment issues when using DRAM and crossing pages. Try using DSPs and modern RISC CPUs which don't support unaligned accesses or use multiple acceesses to read the parts of a longer word (which x86 will do as well actually - they are **always** slower, it might just be shadowed mostly for normal programs). — too honest for this site, Jun 19 '17 at 19:34
@Olaf yes I know there can be a lot of trouble on other platforms, but as you see OP was talking about MSVC. As for x86, there are multiple accesses when crossing a cache line boundary (which is easily avoided here by aligning the whole thing by 4), otherwise it is even atomic which I doubt Intel would have wanted to commit to if it took multiple accesses (performance testing has also not shown multiple accesses for non-crossing unaligned accesses, at least not that I've ever seen and again I would be interested if you have data showing otherwise). — harold, Jun 19 '17 at 19:42
@harold: 1) OP talsk abotu MSVC++. C++ is not C! 2) AFAIK MSVC++ also supports at least AMRV7A/8A. 3) Even for x86, it does not change the facts. — too honest for this site, Jun 19 '17 at 19:45
@Olaf MSVC is the name of the C++ compiler as well. And what facts, you're not giving any evidence for *anything*, you're just trying to bluff and use arguments from authority. — harold, Jun 19 '17 at 19:48
@harold I listed various reasonable factors, I clearly did not argument by authority. I did not see any logical argument from your side for/against whatever. THis discussion is a clear indicator the qauestion is not well suited here. However, you have the last word. — too honest for this site, Jun 19 '17 at 19:52

Why Data alignment of a 2 byte variable at a 2's multiple location in 4 byte compiler (32 bits) needed?

0 Answers0