2

I have inherited a Perl script that, depending on machine configuration, fails during calls to fork with $? == 11.

According to errno.h and various posts, 11 is EAGAIN, i.e. "try again", because some resource was temporarily unavailable.

Is there a way to determine which resource caused the fork to fail, other than increasing various system limits one by one (open file descriptors, swap space, or number of allowable threads)?

Community
  • 1
  • 1
phonybone
  • 35
  • 1
  • 9
  • A forked process is a complete clone of the original process. Profiling your process before the fork will give you clue how much of each resources it is using. – alvits May 20 '17 at 02:25
  • Do you mean $!=11? – ikegami May 20 '17 at 02:43
  • `Do you mean $!=11?`. Actually, I misspoke somewhat. What I should have said was that the fork is successful, but immediately ends with status code 11, as reported by `wait`. – phonybone May 21 '17 at 04:02
  • So your question has nothing to do with fork, so why did you accept my answer which is entirely about the errno set by fork on failure??? – ikegami May 21 '17 at 04:24

2 Answers2

3

Assuming you mean $! is EAGAIN, the fork man page on my system says:

EAGAIN: fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child.

EAGAIN: It was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered. To exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.

Are you trying to create a ton of processes? Are you reaping your children when they are done?

Community
  • 1
  • 1
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • See my comment above for a clarification of the original question, which was an over-simplification on my part. The real sequence of events is as follows: after initialization, the program pre-processes a large (8Gb) file by means of some calls to system(). Then program forks one process for each cpu core on the machine. So the program may be trying to duplication a very large memory space when it forks (but linux is copy-write, so that should be ok?). There is a large amount of RAM (184Gb), and I raised the number of allowable open files and increased swap space. – phonybone May 21 '17 at 04:10
  • Some of the forks succeed and some fail, and if I restrict the number of overall forks, then I can often get the program to run normally. – phonybone May 21 '17 at 04:16
  • Re "*the fork is successful*", & "*Some of the forks succeed and some fail*", Which one is it? What call call is failiing? Why aren't you checking what error it's failing with? – ikegami May 21 '17 at 04:26
1

The error is due to the user has run out of free stacks. Check the security configuration file on RHEL Server [root@server1 webapps]# cat /etc/security/limits.d/90-nproc.conf

# Default limit for number of user's processes to prevent

# accidental fork bombs.

# See rhbz #432903 for reasoning.

* soft nproc 1024

root soft nproc unlimited

[root@server1 webapps]# vi /etc/security/limits.d/90-nproc.conf

[root@server1 webapps]#

In my case, the "test" user was receiving the message "-bash: fork: retry: Resource temporarily unavailable"

Resolved the issue by adding user specific limits for stack limit

[root@server1 webapps]# vi/etc/security/limits.d/90-nproc.conf

# Default limit for number of user's processes to prevent

# accidental fork bombs.

# See rhbz #432903 for reasoning.

* soft nproc 1024

test soft nproc 16384

root soft nproc unlimited

[root@server1 webapps]#

Ishaq Khan
  • 929
  • 9
  • 7