Hundreds of thousands, per Go FAQ: Why goroutines instead of threads?:
It is practical to create hundreds of thousands of goroutines in the same address space.
The test test/chan/goroutines.go creates 10,000 and could easily do more, but is designed to run quickly; you can change the number on your system to experiment. You can easily run millions, given enough memory, such as on a server.
To understand the max number of goroutines, note that the per-goroutine cost is primarily the stack. Per FAQ again:
…goroutines, can be very cheap: they have little overhead beyond the memory for the stack, which is just a few kilobytes.
A back-of-the-envelop calculation is to assume that each goroutine has one 4 KiB page allocated for the stack (4 KiB is a pretty uniform size), plus some small overhead for a control block (like a Thread Control Block) for the runtime; this agrees with what you observed (in 2011, pre-Go 1.0). Thus 100 Ki routines would take about 400 MiB of memory, and 1 Mi routines would take about 4 GiB of memory, which is still manageable on desktop, a bit much for a phone, and very manageable on a server. In practice the starting stack has ranged in size from half a page (2 KiB) to two pages (8 KiB), so this is approximately correct.
The starting stack size has changed over time; it started at 4 KiB (one page), then in 1.2 was increased to 8 KiB (2 pages), then in 1.4 was decreased to 2 KiB (half a page). These changes were due to segmented stacks causing performance problems when rapidly switching back and forth between segments ("hot stack split"), so increased to mitigate (1.2), then decreased when segmented stacks were replaced with contiguous stacks (1.4):
Go 1.2 Release Notes: Stack size:
In Go 1.2, the minimum size of the stack when a goroutine is created has been lifted from 4KB to 8KB
Go 1.4 Release Notes: Changes to the runtime:
the default starting size for a goroutine's stack in 1.4 has been reduced from 8192 bytes to 2048 bytes.
Per-goroutine memory is largely stack, and it starts low and grows so you can cheaply have many goroutines. You could use a smaller starting stack, but then it would have to grow sooner (gain space at cost of time), and the benefits decrease due to the control block not shrinking. It is possible to eliminate the stack, at least when swapped out (e.g., do all allocation on heap, or save stack to heap on context switch), though this hurts performance and adds complexity. This is possible (as in Erlang), and means you’d just need the control block and saved context, allowing another factor of 5×–10× in number of goroutines, limited now by control block size and on-heap size of goroutine-local variables. However, this isn’t terribly useful, unless you need millions of tiny sleeping goroutines.
Since the main use of having many goroutines is for IO-bound tasks (concretely to process blocking syscalls, notably network or file system IO), you’re much more likely to run into OS limits on other resources, namely network sockets or file handles: golang-nuts › The max number of goroutines and file descriptors?. The usual way to address this is with a pool of the scarce resource, or more simply by just limiting the number via a semaphore; see Conserving File Descriptors in Go and Limiting Concurrency in Go.