jenkin 构建动态 slave节点失败

问题

集群外 jenkins 连接 k8s,创建动态 slave,pod templates配置容器情况下,容器创建不成功。
Pod Templates不添加任何容器,使用默认 jnlp,运行简单的 demo 成功。
如果自定义了容器,运行就有报错。

已先行搜索,没有找到解决思路。也不确定配置是否哪里有问题。

辛苦老师帮忙看一下,或者提供一下思路

报错信息

jenkin job报错信息:

image

jenkins log:

2023-03-28 07:06:38.793+0000 [id=46]    INFO    hudson.slaves.NodeProvisioner#update: demo-cw0pm provisioning successfully completed. We have now 2 computer(s)
2023-03-28 07:06:38.850+0000 [id=11288] INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes jenkins/demo-cw0pm
2023-03-28 07:06:40.638+0000 [id=3075]  INFO    o.c.j.p.k.p.r.Reaper$TerminateAgentOnContainerTerminated#lambda$onEvent$1: jenkins/demo-cw0pm Container jnlp was just terminated, so removing the corresponding Jenkins agent
2023-03-28 07:06:40.654+0000 [id=3075]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent demo-cw0pm
2023-03-28 07:06:40.662+0000 [id=3075]  INFO    o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent jenkins/demo-cw0pm
2023-03-28 07:06:40.663+0000 [id=3075]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer demo-cw0pm
2023-03-28 07:06:40.665+0000 [id=3075]  INFO    o.c.j.p.k.p.r.Reaper$TerminateAgentOnPodFailed#onEvent: jenkins/demo-cw0pm Pod just failed. Removing the corresponding Jenkins agent. Reason: null, Message: null
2023-03-28 07:06:40.667+0000 [id=3075]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent demo-cw0pm
2023-03-28 07:06:40.667+0000 [id=3075]  SEVERE  o.c.j.p.k.KubernetesSlave#_terminate: Computer for agent is null: demo-cw0pm
2023-03-28 07:06:40.667+0000 [id=3075]  INFO    hudson.slaves.AbstractCloudSlave#terminate: FATAL: Computer for agent is null: demo-cw0pm
2023-03-28 07:06:48.793+0000 [id=39]    INFO    hudson.slaves.NodeProvisioner#update: demo-flqt0 provisioning successfully completed. We have now 2 computer(s)
2023-03-28 07:06:48.795+0000 [id=11287] INFO    o.c.j.p.k.KubernetesLauncher#launch: Created Pod: kubernetes jenkins/demo-flqt0
2023-03-28 07:06:49.701+0000 [id=11287] INFO    o.c.j.p.k.KubernetesLauncher#launch: Pod is running: kubernetes jenkins/demo-flqt0
2023-03-28 07:06:50.709+0000 [id=3075]  INFO    o.c.j.p.k.p.r.Reaper$TerminateAgentOnContainerTerminated#lambda$onEvent$1: jenkins/demo-flqt0 Container jnlp was just terminated, so removing the corresponding Jenkins agent
2023-03-28 07:06:50.716+0000 [id=3075]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent demo-flqt0
2023-03-28 07:06:50.723+0000 [id=3075]  INFO    o.c.j.p.k.KubernetesSlave#deleteSlavePod: Terminated Kubernetes instance for agent jenkins/demo-flqt0
2023-03-28 07:06:50.724+0000 [id=3075]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Disconnected computer demo-flqt0
2023-03-28 07:06:50.725+0000 [id=3075]  INFO    o.c.j.p.k.p.r.Reaper$TerminateAgentOnPodFailed#onEvent: jenkins/demo-flqt0 Pod just failed. Removing the corresponding Jenkins agent. Reason: null, Message: null
2023-03-28 07:06:50.728+0000 [id=3075]  INFO    o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent demo-flqt0
2023-03-28 07:06:50.728+0000 [id=3075]  SEVERE  o.c.j.p.k.KubernetesSlave#_terminate: Computer for agent is null: demo-flqt0
2023-03-28 07:06:50.728+0000 [id=3075]  INFO    hudson.slaves.AbstractCloudSlave#terminate: FATAL: Computer for agent is null: demo-flqt0
2023-03-28 07:06:51.708+0000 [id=11287] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: demo-flqt0, template=PodTemplate{id='4e7578d1-fdd7-4d0d-9d08-2c5a06e5ed18', name='demo', namespace='jenkins', label='demo', containers=[ContainerTemplate{name='jnlp', image='jenkins/inbound-agent:4.3-4-alpine', workingDir='/home/jenkins/agent', command='/bin/sh -c', args='jenkins-agent', ttyEnabled=true, resourceRequestCpu='', resourceRequestMemory='', resourceRequestEphemeralStorage='', resourceLimitCpu='', resourceLimitMemory='', resourceLimitEphemeralStorage='', livenessProbe=ContainerLivenessProbe{execArgs='', timeoutSeconds=0, initialDelaySeconds=0, failureThreshold=0, periodSeconds=0, successThreshold=0}}]}
java.lang.IllegalStateException: Node was deleted, computer is null
        at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:189)
        at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:298)
        at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
        at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

jenkins-log.log (55.8 KB)

环境

云服务器部署。
k8s 和 jenkins在同一台机器, docker 启动 jenkins,连接 k8s 使用 rbac 认证。

配置

pod templates配置:

简单的 pipeline demo

pipeline{
    agent {

        // label "tests"
        label "demo"
    }
    stages{
        stage('jnlp容器'){
            steps{
                container('jnlp'){
                    sh '''
                    java -version
                    '''
            } 
        }
            
        }    

    
    }

} 

pod templates配置中命令参数加下${computer.jnlpmac} ${computer.name}

老师,加了参数之后再运行有下面的报错,辛苦老师再帮忙看一下

2023-03-29 09:00:04.684+0000 [id=474]   INFO    h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #109 from /43.138.76.173:16574
2023-03-29 09:00:04.749+0000 [id=117]   INFO    j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting [#11] for tests-pp2n6 terminated: java.nio.channels.ClosedChannelException
2023-03-29 09:00:04.920+0000 [id=17]    WARNING hudson.security.csrf.CrumbFilter#doFilter: Found invalid crumb db6d1265560a006ad1782c122f39e05c59f44cad78f638077c0ce9abfca7b436e8015f1260976966f1b8d3f341fff04d58ed5911873470e5cd7a3f0dc1f46678. If you are calling this URL with a script, please use the API Token instead. More information: https://www.jenkins.io/redirect/crumb-cannot-be-used-for-script
2023-03-29 09:00:04.920+0000 [id=17]    WARNING hudson.security.csrf.CrumbFilter#doFilter: No valid crumb was included in request for /ajaxBuildQueue by liaoliao. Returning 403.
2023-03-29 09:00:04.921+0000 [id=13]    WARNING hudson.security.csrf.CrumbFilter#doFilter: Found invalid crumb db6d1265560a006ad1782c122f39e05c59f44cad78f638077c0ce9abfca7b436e8015f1260976966f1b8d3f341fff04d58ed5911873470e5cd7a3f0dc1f46678. If you are calling this URL with a script, please use the API Token instead. More information: https://www.jenkins.io/redirect/crumb-cannot-be-used-for-script
2023-03-29 09:00:04.921+0000 [id=13]    WARNING hudson.security.csrf.CrumbFilter#doFilter: No valid crumb was included in request for /ajaxExecutors by liaoliao. Returning 403.
2023-03-29 09:00:05.466+0000 [id=41]    INFO    c.c.j.p.k.KubernetesCredentialProvider#startWatchingForSecrets: retrieving secrets with selector: jenkins.io/credentials-type, LabelSelector(matchExpressions=[], matchLabels={}, additionalProperties={})
2023-03-29 09:00:05.470+0000 [id=41]    SEVERE  c.c.j.p.k.KubernetesCredentialProvider#startWatchingForSecrets: Failed to initialise k8s secret provider, secrets from Kubernetes will not be available
java.net.UnknownHostException: kubernetes.default.svc: Name or service not known
        at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
        at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)
        at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)
        at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)
        at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)
        at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)
        at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)
        at okhttp3.Dns$Companion$DnsSystem.lookup(Dns.kt:49)
        at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.kt:164)
        at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.kt:129)
        at okhttp3.internal.connection.RouteSelector.next(RouteSelector.kt:71)
        at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:205)
        at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106)
        at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74)
        at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
        at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:517)
Caused: java.io.IOException: kubernetes.default.svc: Name or service not known
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535)
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:427)
Caused: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [list]  for kind: [Secret]  with name: [null]  in namespace: [null]  failed.
        at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159)
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:429)
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:392)
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.list(BaseOperation.java:93)
        at com.cloudbees.jenkins.plugins.kubernetes_credentials_provider.KubernetesCredentialProvider.startWatchingForSecrets(KubernetesCredentialProvider.java:124)
        at com.cloudbees.jenkins.plugins.kubernetes_credentials_provider.KubernetesCredentialProvider$1.doRun(KubernetesCredentialProvider.java:186)
        at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)
        at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
2023-03-29 09:00:05.470+0000 [id=41]    INFO    c.c.j.p.k.KubernetesCredentialProvider#reconnectLater: Attempting to reconnect Kubernetes client in 5 mins
2023-03-29 09:00:09.918+0000 [id=17]    WARNING hudson.security.csrf.CrumbFilter#doFilter: Found invalid crumb db6d1265560a006ad1782c122f39e05c59f44cad78f638077c0ce9abfca7b436e8015f1260976966f1b8d3f341fff04d58ed5911873470e5cd7a3f0dc1f46678. If you are calling this URL with a script, please use the API Token instead. More information: https://www.jenkins.io/redirect/crumb-cannot-be-used-for-script
2023-03-29 09:00:09.919+0000 [id=17]    WARNING hudson.security.csrf.CrumbFilter#doFilter: No valid crumb was included in request for /ajaxBuildQueue by liaoliao. Returning 403.
2023-03-29 09:00:09.919+0000 [id=92]    WARNING hudson.security.csrf.CrumbFilter#doFilter: Found invalid crumb db6d1265560a006ad1782c122f39e05c59f44cad78f638077c0ce9abfca7b436e8015f1260976966f1b8d3f341fff04d58ed5911873470e5cd7a3f0dc1f46678. If you are calling this URL with a script, please use the API Token instead. More information: https://www.jenkins.io/redirect/crumb-cannot-be-used-for-script
2023-03-29 09:00:09.919+0000 [id=92]    WARNING hudson.security.csrf.CrumbFilter#doFilter: No valid crumb was included in request for /ajaxExecutors by liaoliao. Returning 403.
2023-03-29 09:00:12.517+0000 [id=105]   INFO    o.c.j.p.k.KubernetesLauncher#launch: Waiting for agent to connect (300/1,000): tests-pp2n6
2023-03-29 09:00:14.760+0000 [id=482]   INFO    h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #110 from /43.138.76.173:52880
2023-03-29 09:00:14.823+0000 [id=132]   INFO    j.s.DefaultJnlpSlaveReceiver#channelClosed: Computer.threadPoolForRemoting [#13] for tests-pp2n6 terminated: java.nio.channels.ClosedChannelException
2023-03-29 09:00:14.918+0000 [id=242]   WARNING hudson.security.csrf.CrumbFilter#doFilter: Found invalid crumb db6d1265560a006ad1782c122f39e05c59f44cad78f638077c0ce9abfca7b436e8015f1260976966f1b8d3f341fff04d58ed5911873470e5cd7a3f0dc1f46678. If you are calling this URL with a script, please use the API Token instead. More information: https://www.jenkins.io/redirect/crumb-cannot-be-used-for-script
2023-03-29 09:00:14.919+0000 [id=242]   WARNING hudson.security.csrf.CrumbFilter#doFilter: No valid crumb was included in request for /ajaxBuildQueue by liaoliao. Returning 403.

log 文件:
jenkins-log2.log (284.0 KB)

j

根据报错信息“java.net.UnknownHostException: kubernetes.default.svc: Name or service not known”,推测的原因可能是DNS网络配置或者权限方面的问题。试试按照这几个方面排查看看:

1) 在 Jenkins 容器内运行 nslookup kubernetes.default.svc,看是否可以解析 Kubernetes 的服务名

2) 在 Jenkins 容器内运行 curl https://kubernetes.default.svc,看看 Kubernetes API Server 是否已启动并是正确配置的。

3)在 Jenkins 容器内运行 kubectl get pods,检查Jenkins 容器有足够的 rbac 权限访问。