On a 54-core machine, I use os.Exec()
to spawn hundreds of client processes, and manage them with an abundance of goroutines.
Sometimes, but not always, I get this:
runtime: failed to create new OS thread (have 1306 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc
My ulimit is pretty high already:
$ ulimit -u
1828079
There's never a problem if I limit myself to, say, 54 clients.
Is there a way I can handle this situation more gracefully? E.g. not bomb out with a fatal error, and just do less/delayed work instead? Or query the system ahead of time and anticipate the maximum amount of stuff I can do (I don't just want to limit to the number of cores though)?
Given my large ulimit, should this error even be happening? grep -c goroutine
on the stack output following the fatal error only gives 6087. Each client process (of which there are certainly less than 2000) might have a few goroutines of their own, but nothing crazy.
Edit: the problem only occurs on high-core machines (~60). Keeping everything else constant and just changing the number of cores down to 30 (this being an OpenStack environment, so the same underlying hardware still being used), these runtime errors don't occur.