1

I am trying to lift an OptaPlanner project into the cloud as an Azure Function. My goal in this would be to enhance the scaling so that our company can process more solutions in parallel.

Background: We currently have a project running in a Docker container using the optaplanner-spring-boot-starter MVN package. This has been successful when limited to solving one solution at a time. However, we need to dramatically scale the system so that a higher number of solutions can be solved in a limited time frame. Therefore, I'm looking for a cloud-based solution for the extra CPU resources needed.

I created an Azure Function using the optaplanner-core MVN package and our custom domain objects for our existing solution as a proof of concept. The Azure Function uses an HTTP trigger, this seems to work to get a solution, but the performance is seriously degraded. I'm expecting to need to upgrade the consumption plan so that we can specify CPU and memory requirements. However, it appears that Azure is not scaling out additional instances as expected leading to OptaPlanner blocking itself.

Here is the driver of the code:

@FunctionName("solve")
public HttpResponseMessage run(
    @HttpTrigger(name = "req", methods = {HttpMethod.POST },authLevel = AuthorizationLevel.FUNCTION) 
    HttpRequestMessage<Schedule> request,
    final ExecutionContext context) {

    SolverConfig config = SolverConfig.createFromXmlResource("solverConfig.xml");
    
    //SolverManagerConfig managerConfig = new SolverManagerConfig().withParallelSolverCount("2");
    //SolverManagerConfig managerConfig = new SolverManagerConfig().withParallelSolverCount("10");
    //SolverManagerConfig managerConfig = new SolverManagerConfig().withParallelSolverCount("400");
    SolverManagerConfig managerConfig = new SolverManagerConfig().withParallelSolverCount("AUTO");
    
    SolverManager<Schedule, UUID> solverManager = SolverManager.create(config ,managerConfig);
    
    SolverJob<Schedule, UUID> solverJob = solverManager.solve(UUID.randomUUID(), problem);

    // This is a blocking call until the solving ends
    Schedule solution = solverJob.getFinalBestSolution();

    return request.createResponseBuilder(HttpStatus.OK)
        .header("Content-Type", "application/json")
        .body(solution)
        .build();
}

Question 1: Does anyone know how to set up Azure so that each HTTP call causes a scaling out of a new instance? I would like this to happen so that each solver isn't competing for resources. I have tried to configure this by setting FUNCTIONS_WORKER_PROCESS_COUNT=1 and maxConcurrentRequests=1. I have also tried changing OptaPlanners parallelSolverCount and moveThreadCount to different values without any noticeable differences.

Question 2: Should I be using Quarkus with Azure instead of the core MVN package? I've read that Geoffrey De Smet answered, "As for AWS Lambda (serverless): Quarkus is your friend".

I'm out of my element here as I haven't coded with Java for over 20 years AND I'm new to both Azure Functions and OptaPlanner. Any advice would be greatly appreciated.

Thanks!

Geoffrey De Smet
  • 26,223
  • 11
  • 73
  • 120
Jerod Houghtelling
  • 4,783
  • 1
  • 22
  • 30

2 Answers2

1

Consider using OptaPlanner's Quarkus integration to compile natively. That is better for serverless deployments because it dramatically reduces the startup time. The README of the OptaPlanner quickstarts that use Quarkus explain how.

By switching from OptaPlanner in plain java to OptaPlanner in Quarkus (which isn't a big difference), a few magical things will happen:

  1. The parsing of solverConfig.xml with an XML parser won't happen at runtime during bootstrap, but at build time. If its in src/main/resources/solverConfig.xml, quarkus will automatically pick it up to configure the SolverManager to inject.
  2. No reflection at runtime

You will want to start 1 run per dataset. So parallelSolverCount shouldn't be higher than 1 and no run should handle 2 datasets (even not sequentially). If a run gets 8000 cpuMillis, you can use moveThreadCount=4 for it to get better results faster. If it only gets a 1000 cpuMillis (= 1 core), don't use move threads. Verify a run gets enough memory.

Geoffrey De Smet
  • 26,223
  • 11
  • 73
  • 120
  • thank you for your response!!! I will try these things. One of the reasons I moved from Spring Boot to plain java was so that I could configure the timeout as a request parameter instead of just using the solverConfig.xml. If I move to Quarkus would this still be available? I left out this statement from my original post: `config.getTerminationConfig().setMinutesSpentLimit(Long.parseLong(request.getQueryParameters().get("solveTimeLimit"));` – Jerod Houghtelling Jun 17 '22 at 15:10
  • Not in an ideal way (create a jira for that)... But you can `@Inject SolverConfig`, clone and modify per dataset (= you only have 1 dataset per pod so no need to clone it) and build the SolverManager or SolverFactory yourself. – Geoffrey De Smet Jun 18 '22 at 09:31
1

As for your Question 1, unfortunately, I don't have a solution for Azure Functions, but let me point you to a blogpost about running (and scaling) OptaPlanner workloads on OpenShift, which could address some of your concerns on the architecture level.

Scaling is only static for now (the number of replicas is specified manually), but it can be paired with KEDA to scale based on the number of pending datasets.

Important to note, the optaplanner-operator is only experimental at this point.

Radovan Synek
  • 954
  • 6
  • 11