Natural alignment refers to the size of the variable, not the size of the processor register and/or data path. A floating point double
is 8 bytes, and so its natural alignment is 8 bytes. To be more precise, the natural alignment is the smallest power of 2 that is large enough to hold the variable, that definition covers the case of "long double" or x86 extended precision which is a 10-byte variable and whose natural alignment is a multiple of 16 bytes. For x86 processors see the optimization manual and search for alignment
, you will find this is a subject rich in detail and specifics vary by micro-architecture, even within the same processor family. In particular, section 3.6.4 Alignment says
For best performance, align data as follows:
- Align 8-bit data at any address.
- Align 16-bit data to be contained within an aligned 4-byte word.
- Align 32-bit data so that its base address is a multiple of four.
- Align 64-bit data so that its base address is a multiple of eight.
- Align 80-bit data so that its base address is a multiple of sixteen.
- Align 128-bit data so that its base address is a multiple of sixteen.
The Pentium 4 is a 32-bit processor, part of the IA-32 family, yet it has a 64-bit data path (Front Side Bus). There are 32-bit processors that have only 16-bit buses, see 32-bit computing historical perspective. Accessing a variable at an alignment other than its natural alignment may result in a performance penalty, or an alignment fault, depending on the processor, in some cases the setting of a control bit, the type of variable, the instruction used, etc.
The actual alignment is up to the compiler and the calling conventions. For structures the requirement is that the first member variable must be at offset 0 (zero) and variables must be allocated in the order they are declared, padding may be inserted between variables for alignment and after the last variable to pad the size of the structure. In 32-bit Windows the stack is only required to be 4-byte aligned, so the compiler would have to generate extra code to ensure 8-byte alignment of a double
allocated on the stack.
In Agner Fog's Calling Conventions document you will find details on the alignment used in different operating systems and by different compilers. The stack has a 4-byte alignment in 32-bit Windows, which explains why you may have observed a floating point double
aligned at a 4-byte but not 8-byte boundary when allocated on the stack - the compiler doesn't have a clue when a function gets called whether the stack will be 8-byte aligned or not. In table-2 of that document it shows the alignment of various data types allocated in static storage as implemented by various compilers, you will notice that in 32-bit Windows the only compiler that allows 4-byte alignment for double is the Borland compiler.

When allocating in a structure according to that document the Borland compiler allows double
to be at any byte offset (which I find surprising).

Here's the text description in the document, copied here for reference
Table 3 shows the alignment in bytes of data members of structures
and classes. The compiler will insert unused bytes, as required,
between members to obtain this alignment. The compiler will also
insert unused bytes at the end of the structure so that the total size
of the structure is a multiple of the alignment of the element that
requires the highest alignment. Many compilers have options to change
the default alignments. Differences in structure member alignment will
cause incompatibility between different programs or modules accessing
the same data and when data are stored in binary files. The programmer
can avoid such compatibility problems by ordering the structure
members so that no unused bytes need to be inserted. Likewise, the
padding at the end of the structure may be specified explicitly by
inserting dummy members of the required size. The size of the virtual
table pointer, if any, must be taken into account (see chapter 11).
5 Stack alignment
The stack pointer must be aligned by the stack word
size at all times. Some systems require a higher alignment. The Gnu
compiler version 3.x and later for 32-bit Linux and Mac OS X makes the
stack pointer aligned by 16 at every function call instruction.
Consequently it can rely on ESP = 12 modulo 16 at every function
entry. This alignment is not consistently implemented. It is
specified in the Mac OS ABI, but nowhere else. The stack is not
aligned when compiling with option -Os or
-mpreferred-stack-boundary=2, but apparently the Gnu compiler erroneously relies on the stack being aligned by 16 despite these
options. The Intel compiler (v. 9.1.038) for 32 bit Linux does not
have the same alignment. (I have submitted bug reports to Gnu and
Intel about this in 2006. In 2009 Intel added a -falign-stack=
assume-16-byte option to ICC version 11.0 to fix the problem). The
stack is aligned by 4 in 32-bit Windows. The 64 bit systems keep the
stack aligned by 16. The stack word size is 8 bytes, but the stack
must be aligned by 16 before any call instruction. Consequently, the
value of the stack 10 pointer is always 8 modulo 16 at the entry of a
procedure. A procedure must subtract an odd multiple of 8 from the
stack pointer before any call instruction. A procedure can rely on
these rules when storing XMM data that require 16-byte alignment. This
applies to all 64 bit systems (Windows, Linux, BSD). Where at least
one function parameter of type __m256 is transferred on the stack,
Unix systems (32 and 64 bit) align the parameter by 32 and the called
function can rely on the stack being aligned by 32 before the call
(i.e. the stack pointer is 32 minus the word size modulo 32 at the
function entry). This does not apply if the parameter is transferred
in a register. Various methods for aligning the stack are described
in Intel's application note AP 589 "Software Conventions for
Streaming SIMD Extensions", "Data Alignment and Programming Issues
for the Streaming SIMD Extensions with the Intel® C/C++ Compiler", and
"IA-32 Intel ® Architecture Optimization Reference Manual".