I'll give you a simple answer, but for further information I recommend the link below to AMD's architecture documents, very easy reading. PS: I haven't covered Xeon or PAE here..
IA-32 (x86) architecture has a 32bit physical address bus for RAM.
The 32bit bus is further split into 2x 16bits segments each one capable of accessing 2GB of RAM a total of 4GB.
This is called memory bank switching.
In order to allow protection Intel along with MS decided to use one segment for kernelmode and the other for usermode - which is why Windows historically had a 2GB usermode address space. It's an x86 hardware limitation, not a Windows limitation.
The segment registers separated kernel space and userspace addresses.
That's how memory protection was implemented.
Furthermore IA-32 general had 32bit internal registers as well, so it couldn't page. This is real mode (no address translation).
Paging requires 36bits I think (don't quote me ) that's where IA32e came in. The extra bits on IA-32e allowed paging from the HDD, this is the only way it could run on x64 Windows since x64 requires NX and it's located at bit 63.
Please read the AMD architecture documents, personally I find them more informative than the Intel versions.
http://developer.amd.com/wordpress/media/2012/10/24593_APM_v21.pdf
PS with AMD64 flat memory was introduced, doing away with segments.
However 32bit processes still need segment registers.
On AMD64 when a 32bit process hits top of stack a pointer is thrown to a base address on a new segment register. This way 32bit apps can effectively eat as much RAM as they want, no limit. Well within reason ofc...:)
Hope this helps.