I've read both in Wikipedia and in a few answers here on stack overflow regarding processor alignment, but there's one thing I don't understand:
If a 32 bit processor aligns to 4 byte increments, why would struct.pack('BH', 1, 2)
add a null byte in the middle?
The short will not border on an address divisible by 4 (only divisible by 2), and when the processor will read a word, it will read all 4 bytes either case, whether the short is in the middle or at the end.
It does not prepare the ground for more data either as another byte can join in address 3-4 and take no extra space, while being perfectly 1-byte aligned.