2

I am trying to run a pipeline job that runs scala tests on a slave machine I build dynamically on amazon.

Provision a machine --> build the code on provisioned machine --> run tests

When my tests are starting, after a couple of minutes I hit the "Too many open files" error and my job quits. I tried changing the limits on the slave machine but everything I do seems to get ignored as soon as I run it through the Jenkins job.

Note: when I run the same command manually from the slave machine, it works like a charm and I indeed see that the limits are defined properly for the jenkins user I defined on the system.

This is how I set everything up (slave machine is CentOS 7.2)

As root, I have changed the /etc/security/limits.conf file and added:

*  hard    nofile     100000
*  soft    nofile     100000
jenkins  hard    nofile     100000
jenkins  soft    nofile     100000

I have also made sure there is nothing under /etc/security/limits.d/ that can override these values. (I originally created a /etc/security/limits.d/30-jenkins.conf with these values which didn't help)

I have made sure /etc/pam.d/login has the following line:

session    required   pam_limits.so

After reboot, I have made sure that running 'ulimit -a' from the jenkins user indeed show the new values.

I have created a simple pipeline job:

node('master') {
    stage('master-limits') {
        sh('ulimit -a')
    }
}
node('swarm') {
    stage('slave-limits') {
        sh('ulimit -a')
    }
}

When run, I am getting the following output:

[Pipeline] stage
[Pipeline] { (master-limits)
[Pipeline] sh
00:00:00.053 [ulimit-test] Running shell script
00:00:00.305 + ulimit -a
00:00:00.306 time(seconds)        unlimited
00:00:00.306 file(blocks)         unlimited
00:00:00.306 data(kbytes)         unlimited
00:00:00.306 stack(kbytes)        8192
00:00:00.306 coredump(blocks)     0
00:00:00.306 memory(kbytes)       unlimited
00:00:00.306 locked memory(kbytes) 64
00:00:00.306 process              64111
00:00:00.306 nofiles              65536
00:00:00.306 vmemory(kbytes)      unlimited
00:00:00.306 locks                unlimited
00:00:00.306 rtprio               0

[Pipeline] stage
[Pipeline] { (slave-limits)
[Pipeline] sh
00:00:00.348 [ulimit-test] Running shell script
00:00:00.606 + ulimit -a
00:00:00.606 time(seconds)        unlimited
00:00:00.606 file(blocks)         unlimited
00:00:00.606 data(kbytes)         unlimited
00:00:00.606 stack(kbytes)        8192
00:00:00.606 coredump(blocks)     0
00:00:00.606 memory(kbytes)       unlimited
00:00:00.606 locked memory(kbytes) 64
00:00:00.606 process              257585
00:00:00.606 nofiles              4096
00:00:00.606 vmemory(kbytes)      unlimited
00:00:00.606 locks                unlimited
00:00:00.606 rtprio               0

As you can see from the result, the limits on the slave are still showing as "4096" even though I have changed it to 100000.

My swarm plugin is running as a service and configured as such:

[Unit]
Description=Jenkins slave daemon
After=network.target

[Service]
User=jenkins
ExecStart=/usr/java/jdk1.8.0_112/bin/java -jar /var/lib/jenkins/swarm-client-3.3.jar -fsroot /var/lib/jenkins -username jenkins@jenkins.com -password * -name swarm -master http://jenkins.com -executors 1 -mode exclusive -labels swarm -disableSslVerification
ExecStop=/usr/bin/killall -w -s 2 java

[Install]
WantedBy=multi-user.target

I am using Jenkins 2.62 with the swarm plugin to setup slaves on demand. I am using JAVA 1.8.0_112 on both master and slave. The problem is consistent with either Ubuntu based slave or CentOS based slave.

Am I missing something?

user1559263
  • 83
  • 1
  • 5

1 Answers1

4

This issue eluded me as well until I starting reading about the systemd init system. It appears it does not honor whatever limits you set in the /etc/security/limits.conf file

Instead, you need to edit your systemd configuration file as such:

[Unit]
Description=Jenkins slave daemon
After=network.target

[Service]
User=jenkins
LimitNOFILE=100000
ExecStart=/usr/java/jdk1.8.0_112/bin/java -jar /var/lib/jenkins/swarm-client-3.3.jar -fsroot /var/lib/jenkins -username jenkins@jenkins.com -password * -name swarm -master http://jenkins.com -executors 1 -mode exclusive -labels swarm -disableSslVerification
ExecStop=/usr/bin/killall -w -s 2 java

[Install]
WantedBy=multi-user.target

Found the answer here: https://serverfault.com/a/678861

Serban Cezar
  • 507
  • 1
  • 8
  • 19