0

I have a node.js service that I want to build with Jenkins in Kubernetes, using a Jenkins agent pod specified by the node.js project. I am trying to eliminate manual touch to Jenkins UI. Everything is running in one kubernetes cluster.

I am following this blog and adapting it slightly, but running into problems:

  1. I get an error ‘Jenkins’ doesn’t have label ‘test-pod’
  2. The job loops infinitely.

The build agent is successfully created in Kubernetes. The test-pod label is specified by the Jenkinsfile so I don't know why I get this error. And how is it infinitely looping?

podTemplate(
    name: 'test-pod',
    label: 'test-pod',
    containers: [
        containerTemplate(name: 'node14', image: 'node:14-alpine'),
        containerTemplate(name: 'docker', image:'trion/jenkins-docker-client'),
    ],
    {
        node('test-pod') {
            stage('Build'){
                container('node14') {
                    // do nothing just yet
                }
            }
        }
    }
)

Here is part of the Jenkins console output:

Started by user admin
Obtained Jenkinsfile from git ssh://git@kube-master.cluster.dev/git/hello.git
Running in Durability level: MAX_SURVIVABILITY
[Pipeline] Start of Pipeline
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] node
Created Pod: kubernetes jenkins/test-pod-2hdfp-9kcjj
[Normal][jenkins/test-pod-2hdfp-9kcjj][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-9kcjj to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-9kcjj][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-9kcjj][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-9kcjj][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-9kcjj][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-9kcjj][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-9kcjj][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-9kcjj][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-9kcjj][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-9kcjj][Started] Started container jnlp
jenkins/test-pod-2hdfp-9kcjj Container node14 was terminated (Exit Code: 0, Reason: Completed)
[Normal][jenkins/test-pod-2hdfp-9kcjj][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-gc2qb
[Normal][jenkins/test-pod-2hdfp-gc2qb][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-gc2qb to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-gc2qb][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-gc2qb][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-gc2qb][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-gc2qb][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-gc2qb][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-gc2qb][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-gc2qb][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-gc2qb][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-gc2qb][Started] Started container jnlp
jenkins/test-pod-2hdfp-gc2qb Container node14 was terminated (Exit Code: 0, Reason: Completed)
Still waiting to schedule task
‘Jenkins’ doesn’t have label test-pod’
[Normal][jenkins/test-pod-2hdfp-gc2qb][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-xwkm2
[Normal][jenkins/test-pod-2hdfp-xwkm2][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-xwkm2 to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-xwkm2][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-xwkm2][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-xwkm2][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-xwkm2][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-xwkm2][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-xwkm2][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-xwkm2][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-xwkm2][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-xwkm2][Started] Started container jnlp
jenkins/test-pod-2hdfp-xwkm2 Container node14 was terminated (Exit Code: 0, Reason: Completed)
[Normal][jenkins/test-pod-2hdfp-xwkm2][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-4ltq3
[Normal][jenkins/test-pod-2hdfp-4ltq3][Scheduled] Successfully assigned jenkins/test-pod-2hdfp-4ltq3 to kube-worker2.cluster.dev
[Normal][jenkins/test-pod-2hdfp-4ltq3][Pulled] Container image "node:14-alpine" already present on machine
[Normal][jenkins/test-pod-2hdfp-4ltq3][Created] Created container node14
[Normal][jenkins/test-pod-2hdfp-4ltq3][Started] Started container node14
[Normal][jenkins/test-pod-2hdfp-4ltq3][Pulled] Container image "trion/jenkins-docker-client" already present on machine
[Normal][jenkins/test-pod-2hdfp-4ltq3][Created] Created container docker
[Normal][jenkins/test-pod-2hdfp-4ltq3][Started] Started container docker
[Normal][jenkins/test-pod-2hdfp-4ltq3][Pulled] Container image "jenkins/inbound-agent:4.3-4" already present on machine
[Normal][jenkins/test-pod-2hdfp-4ltq3][Created] Created container jnlp
[Normal][jenkins/test-pod-2hdfp-4ltq3][Started] Started container jnlp
jenkins/test-pod-2hdfp-4ltq3 Container node14 was terminated (Exit Code: 0, Reason: Completed)
[Normal][jenkins/test-pod-2hdfp-4ltq3][Killing] Stopping container docker
Created Pod: kubernetes jenkins/test-pod-2hdfp-0216w
...

Update with latest findings

Master log (see debugging) doesn't provide much else:

...
2021-04-30 11:52:42.715+0000 [id=4660]  INFO    hudson.slaves.NodeProvisioner#lambda$update$6: test-pod-gb4vq-hf3d4 provisioning successfully completed. We have now 2 computer(s)
2021-04-30 11:52:42.741+0000 [id=4659]  INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes jenkins/test-pod-gb4vq-hf3d4
2021-04-30 11:52:42.847+0000 [id=4680]  WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: test-pod-gb4vq-pdd69, template=PodTemplate{id='f29ecbdd-9c1d-468f-86ff-dd46ff40f306', name='test-pod-gb4vq', namespace='jenkins', label='test-pod', containers=[ContainerTemplate{name='node14', image='node:14-alpine'}, ContainerTemplate{name='docker', image='trion/jenkins-docker-client'}], annotations=[PodAnnotation{key='buildUrl', value='http://172.16.1.12/job/hello/14/'}, PodAnnotation{key='runUrl', value='job/hello/14/'}]}
java.lang.IllegalStateException: Pod is no longer available: jenkins/test-pod-gb4vq-pdd69
...

except that it suggests the container is starting up, then failing. It appears the loop is because the error handling in the Kubernetes plug-in isn't properly catching it and failing the job.

By watching for the build pod (using k9s) I am able to capture the pod's log, and Unknown client name also sounds like it is caused by fast container termination:

 jnlp INFO: [JNLP4-connect connection to 172.16.1.12/172.16.1.12:50000] Local headers refused by remote: Unknown client name: test-pod-34sd7-5xhs2                                                                                              
 jnlp Apr 29, 2021 10:42:15 PM hudson.remoting.jnlp.Main$CuiListener status                                                                                                                                                                      
 jnlp INFO: Protocol JNLP4-connect encountered an unexpected exception                                                                                                                                                                           
 jnlp java.util.concurrent.ExecutionException: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: test-pod-34sd7-5xhs2
 jnlp     at org.jenkinsci.remoting.util.SettableFuture.get(SettableFuture.java:223)                                                                                                                                                             
 jnlp     at hudson.remoting.Engine.innerRun(Engine.java:743)                                                                                                                                                                                    
 jnlp     at hudson.remoting.Engine.run(Engine.java:518)                                                                                                                                                                                         
 jnlp Caused by: org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name: test-pod-34sd7-5xhs2                                                                                                                     
 jnlp     at org.jenkinsci.remoting.protocol.impl.ConnectionHeadersFilterLayer.newAbortCause(ConnectionHeadersFilterLayer.java:378)

Just found a similar issue

This is useful: I added podRetention: always(), to podTemplate() after label so the agent pods don't terminate, and they show Error.

Good finding

With the above retaining the pod on error, I can now find /var/log/containers/<failed pod>.log and it has led me to a root cause.

2021-04-30T08:59:36.047989534-04:00 stderr F java.net.UnknownHostException: updates.jenkins.io

This is because of dnsPolicy that limits DNS to cluster-only lookups. The fix for this is to add hostNetwork: true to podTemplate() next to label.

Next, the image trion/jenkins-docker-client as recommended by the blog is a client AND a server, so it is the wrong image.

Switching to jenkins/agent creates a new problem. The pod now goes up and down doing nothing, not even logging. I suspect this is a launch parameter issue.

Now it is clear I shouldn't even have a Jenkins container in the Jenkinsfile, because the Kubernetes plug-in will automatically start a JNLP container.

And that means the problem is, at last, the node14 container - which either is immediately erroring, or immediately finding nothing to do and terminating.

jws
  • 2,171
  • 19
  • 30
  • 1
    The label error in fact is not an error but it indicates that there might be something misconfigured (see [here](https://github.com/jenkinsci/docker-plugin/issues/574#issuecomment-375619730)). You have a line before that that mentions that container was terminated. That is something you should look into. Do you have any other logs that might help (sorry Im not that familiar with Jenkins). – acid_fuji Apr 30 '21 at 08:49
  • Good link. I have to be very fast to catch the log in the pod. jnlp doesn't know the pod Jenkins just created for itself. org.jenkinsci.remoting.protocol.impl.ConnectionRefusalException: Unknown client name Now looking for some reason for this. – jws Apr 30 '21 at 11:37
  • Unknown client name is related to the infinite loop problem. Because error handling in the Kubernetes plug-in doesn't stop on the first error, it retries after the build pod has been terminated, leading to this unknown client name error. – jws Apr 30 '21 at 16:03

1 Answers1

1

The error handling is difficult to understand and troubleshoot, and the blog is wrong.

Start with a bare minimum working agent Jenkinsfile:

podTemplate(
    name: 'build-pod',
    namespace: 'jenkins',
    podRetention: always(),  // for debugging
    {
        node(POD_LABEL) {
            stage('Build') {
                sh "echo hello"
            }
        }
    }
)

From there, extend it with containers, volumes, container build sections, etc. one step at a time.

Troubleshoot using the logs:

kubectl get pods -n jenkins to list the pod name, and then kubectl logs -f <jenkins-pod> -n jenkins

(assuming jenkins is your Kubernetes namespace)

jws
  • 2,171
  • 19
  • 30