I am trying to run a pipeline job that runs scala tests on a slave machine I build dynamically on amazon.
Provision a machine --> build the code on provisioned machine --> run tests
When my tests are starting, after a couple of minutes I hit the "Too many open files" error and my job quits. I tried changing the limits on the slave machine but everything I do seems to get ignored as soon as I run it through the Jenkins job.
Note: when I run the same command manually from the slave machine, it works like a charm and I indeed see that the limits are defined properly for the jenkins user I defined on the system.
This is how I set everything up (slave machine is CentOS 7.2)
As root, I have changed the /etc/security/limits.conf file and added:
* hard nofile 100000
* soft nofile 100000
jenkins hard nofile 100000
jenkins soft nofile 100000
I have also made sure there is nothing under /etc/security/limits.d/ that can override these values. (I originally created a /etc/security/limits.d/30-jenkins.conf with these values which didn't help)
I have made sure /etc/pam.d/login has the following line:
session required pam_limits.so
After reboot, I have made sure that running 'ulimit -a' from the jenkins user indeed show the new values.
I have created a simple pipeline job:
node('master') {
stage('master-limits') {
sh('ulimit -a')
}
}
node('swarm') {
stage('slave-limits') {
sh('ulimit -a')
}
}
When run, I am getting the following output:
[Pipeline] stage
[Pipeline] { (master-limits)
[Pipeline] sh
00:00:00.053 [ulimit-test] Running shell script
00:00:00.305 + ulimit -a
00:00:00.306 time(seconds) unlimited
00:00:00.306 file(blocks) unlimited
00:00:00.306 data(kbytes) unlimited
00:00:00.306 stack(kbytes) 8192
00:00:00.306 coredump(blocks) 0
00:00:00.306 memory(kbytes) unlimited
00:00:00.306 locked memory(kbytes) 64
00:00:00.306 process 64111
00:00:00.306 nofiles 65536
00:00:00.306 vmemory(kbytes) unlimited
00:00:00.306 locks unlimited
00:00:00.306 rtprio 0
[Pipeline] stage
[Pipeline] { (slave-limits)
[Pipeline] sh
00:00:00.348 [ulimit-test] Running shell script
00:00:00.606 + ulimit -a
00:00:00.606 time(seconds) unlimited
00:00:00.606 file(blocks) unlimited
00:00:00.606 data(kbytes) unlimited
00:00:00.606 stack(kbytes) 8192
00:00:00.606 coredump(blocks) 0
00:00:00.606 memory(kbytes) unlimited
00:00:00.606 locked memory(kbytes) 64
00:00:00.606 process 257585
00:00:00.606 nofiles 4096
00:00:00.606 vmemory(kbytes) unlimited
00:00:00.606 locks unlimited
00:00:00.606 rtprio 0
As you can see from the result, the limits on the slave are still showing as "4096" even though I have changed it to 100000.
My swarm plugin is running as a service and configured as such:
[Unit]
Description=Jenkins slave daemon
After=network.target
[Service]
User=jenkins
ExecStart=/usr/java/jdk1.8.0_112/bin/java -jar /var/lib/jenkins/swarm-client-3.3.jar -fsroot /var/lib/jenkins -username jenkins@jenkins.com -password * -name swarm -master http://jenkins.com -executors 1 -mode exclusive -labels swarm -disableSslVerification
ExecStop=/usr/bin/killall -w -s 2 java
[Install]
WantedBy=multi-user.target
I am using Jenkins 2.62 with the swarm plugin to setup slaves on demand. I am using JAVA 1.8.0_112 on both master and slave. The problem is consistent with either Ubuntu based slave or CentOS based slave.
Am I missing something?