The memory usage for a thread can be conceptually divided into:
The threads stack (defaults to 2MB I believe, but can be changed using the -Xss VM option and/or specified in the Thread constructor)
The java thread object and associated objects (located in the VM heap). Practically constant for a given implementation. Only be a few KB.
Native overhead - the kernel memory required to manage the thread. Should be neglible (a few KB).
User data managed by the thread (data reachable through its thread object or local variables). Can vary greatly.
The first three elements are easy to measure (they are practically constant with a VM instance, scaling linearly with number of threads), the last thing depends completely on the threads code/data.
Since the stack size usually grealy dominates the per thread cost (compared to the kernel overhead and thread object), the memory impact for a waiting thread can be simplified to its stack size.
For virtual mememory systems thats only the virtual impact (address space allocated), but not neccessarily the amount of physical memory assigned (unused stack space will never be assigned physical memory pages). 32-Bit systems can run out of address space very quickly when you create many threads (example: 1000 threads times 2MB stack size = 2GB).