As described in the article you link to, the 4GB address space available on a 32-bit CPU is divided into two parts: user mode or application address space, and kernel mode address space.
The user-mode address space is per-process. Each process has a different mapping between pages in the user-mode address space and physical or virtual memory.
The kernel-mode address space is the same regardless of which process is currently running. Otherwise, the address space would have to be remapped on every transition to kernel mode, which would be very inefficient. (The article does say that, but only very briefly: "the operating system makes its virtual memory visible in the address space of every process".)
By default, 32-bit Windows divides this up evenly, 2GB for user space and 2GB for kernel space, but it can be configured to divide it up 3GB/1GB instead.
On x64 Windows, the kernel runs in 64-bit mode, so it has access to the full address space permitted by the CPU, which is currently 48 bits or 256TB. The first x64 Windows releases used only 16TB of address space, divided evenly: 8TB for the application address space (for 64-bit applications) and 8TB for the kernel. In Windows 8.1 this was increased to use the entire 256TB allowed by the CPU, again divided evenly: 128TB for 64-bit applications, 128TB for the kernel.
32-bit applications run in the WOW64 emulation environment, with the CPU running in legacy mode. However, the kernel never runs in legacy mode. When a kernel transition is necessary, the CPU must be switched from legacy mode to long mode, which also means that it switches from the 32-bit address space to the 64-bit address space. x64 CPUs are designed so that this transition is efficient.
As a result, there is no need for any of the 32-bit address space to be reserved for the kernel.
To ensure backwards compatibility, a 32-bit process whose executable is not flagged as large address aware is still restricted to 2GB of address space. If the executable is large address aware, the process gets all 4GB.
You should note that this really is address space, not memory or even virtual memory. A 32-bit application can use file mappings and other methods to make use of more than 4GB of memory.
You should also note that the fact that the process has access to 2GB/3GB/4GB of address space does not mean that the application can use all of that space. Windows reserves some user-mode address space in each process for itself.
Address space and other limits are documented here: Memory Limits for Windows and Windows Server Releases.