1

I am running Spring Batch application in Kubernetes environment. The k8s cluster have one master and three worker nodes. I am testing spring batch under high load, which is spawning around 100 worker pods. However, all the 100 pods are coming up only on two out of three worker nodes. No node selector or additional labeling has been done on the nodes.

I have used Spring cloud deployer Kubernetes to create worker pods in Kubernetes.

The versions involved are:

  • Spring Boot: 2.1.9.RELEASE
  • Spring Cloud: 2020.0.1
  • Spring Cloud Deployer: 2.5.0
  • Spring Cloud Task: 2.1.1.RELEASE
  • Kubernetes: 1.21

How can I ensure that worker pods get scheduled on all available worker nodes evenly?

Following is the partition handler implementation responsible for launching the tasks.

@Bean
public PartitionHandler partitionHandler(TaskLauncher taskLauncher, JobExplorer jobExplorer) {

    Resource resource = this.resourceLoader.getResource(resourceSpec);

    DeployerPartitionHandler partitionHandler = new DeployerPartitionHandler(taskLauncher, jobExplorer, resource,
        "worker");

    commandLineArgs.add("--spring.profiles.active=worker");
    commandLineArgs.add("--spring.cloud.task.initialize.enable=false");
    commandLineArgs.add("--spring.batch.initializer.enabled=false");
    commandLineArgs.add("--spring.cloud.task.closecontext_enabled=true");
    commandLineArgs.add("--logging.level.root=DEBUG");

    partitionHandler.setCommandLineArgsProvider(new PassThroughCommandLineArgsProvider(commandLineArgs));
    partitionHandler.setEnvironmentVariablesProvider(environmentVariablesProvider());
    partitionHandler.setApplicationName(appName + "worker");
    partitionHandler.setMaxWorkers(maxWorkers);

    return partitionHandler;
}

@Bean
public EnvironmentVariablesProvider environmentVariablesProvider() {
    return new SimpleEnvironmentVariablesProvider(this.environment);
}
Wytrzymały Wiktor
  • 11,492
  • 5
  • 29
  • 37
  • Please check whether you have taints and tolerations applied to any node ? https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ – Chandra Sekar Sep 03 '21 at 04:35
  • No, only the master node have taint for NoSchedule. Worker nodes don't have any taint and toleration applied to them. – Abhinav Sharma Sep 03 '21 at 04:57
  • +1 to checking Node affinity of your Pods. If nothing is set at that level, do the nodes have the same resources (ie are you sure k8s can schedule workers on each node)? I'm thinking of one node not having enough resources to get workers scheduled on it. – Mahmoud Ben Hassine Sep 03 '21 at 07:07
  • All the 3 nodes are of same configuration and no node selector or affinity is being applied. – Abhinav Sharma Sep 03 '21 at 07:41
  • @AbhinavSharma What is the cluster setup? Cloud/on-premise? Was it setup using `kubeadm`? – moonkotte Sep 03 '21 at 15:01
  • It is an on-premise setup with kubeadm. – Abhinav Sharma Sep 03 '21 at 15:06
  • Have you specified Nodeselector? Refer to: https://stackoverflow.com/questions/62127431/conditionally-launch-spring-cloud-task-on-a-specific-node-of-kubernetes-cluster and https://stackoverflow.com/questions/56776162/setting-node-selector-for-spring-cloud-dataflow-task-and-stream-deployments-on-k – Mayank S Sep 05 '21 at 10:23
  • No, node selectors are not specified – Abhinav Sharma Sep 06 '21 at 02:00
  • @AbhinavSharma Have you managed to resolve this? Maybe try to rejoin the node? If no taints/tolerations and nodeselectors are set up, pods should be scheduled freely. Or you can try to drain one of the nodes so pods will be forced to rescheduled on other nodes. – moonkotte Sep 09 '21 at 09:18
  • 1
    Re-joining the nodes seems to have solved the issue. – Abhinav Sharma Sep 13 '21 at 08:06

1 Answers1

1

Posting this out of comments as a community wiki for better visibility, feel free to edit and expand.


There are scheduling mechanics which can prevent scheduling pods on some nodes:

If nothing is set, it's worth trying to rejoin the node. For instance it might not be registered correctly (this solved the issue above).

moonkotte
  • 3,661
  • 2
  • 10
  • 25