If the process is executing as a 32-bit process, most OSes only retain about 2GB of address space for the process, the other 2GB of address space is mapped for kernel stuff (so that when your process calls kernel stuff, you don't have to perform as many context switches).
Even if your machine has 8 GBs of ram, or 2 GBs with 2GBs of swap, each 32-bit process would only be able to allocate and address 2GB, unless you use PAE or similar.
This causes a few problems. One, you may not have enough raw address space to store the total size of all allocations. Two, you may not have a single contiguous chunk of memory that is the size of the array you need - Java and several other VM environments use separate heaps to store different types of memory, eg, a large object heap separate from gen 0, or gen 1, etc objects. Each partition leads to smaller contiguous regions.
In a 64-bit process, the address space restrictions are nearly gone, however, you may still not have enough contiguous, committable, java-allowed memory to satisfy the request. If you set Java to only allow a total of 2GB of memory, you may still have problems finding enough contiguous memory to satisfy the request.
Keep in mind that the process does need a sizable chunk of memory to store the code pages for your program, and needs memory for the java runtime. That alone may be a couple hundred megs of memory, depending on demands of the rest of your program.
It may be instructful to execute your simple program while it allocates a 1-element byte array, and inspect the memory with SysInternal's VMMap to get an idea of where your memory overhead comes from, excluding your large allocation.
Then give it a shot with your big allocation and see what you get.