3

We have many batch jobs which today are scheduled via cron expressions in a single application. We would like to isolate these jobs more and therefore move them to spring cloud task.

But reading the documentation [1], I come to the conclusion that I have to use a triggertask (source) which in turn sends a TaskLaunchRequest to a tasklauncher (sink) to finally launch the new process.

This means (if I have only one task/batch) I need at least the following JVM processes running to trigger one new process:

  • flow server
  • triggertask (source)
  • tasklauncher (sink)

OK, flow server and tasklauncher will be shared for any upcoming task, but triggertask can only take the cron definition for a single task and therefore has to replicated for any upcoming taskdefinition. So I need at least one "nanny process" for each task?

really??? this sounds like a huge overkill... From my point of view, I would have expected cron scheduling is a core functionality of the task definition, so the only thing needed would be the flow server.

Do I understand this correct or is there anything I have missed? Is there an easier way to do this in the spring cloud environment? I really like the idea having a flow server starting new JVMs when required, but all these additional process really feel to be the wrong approach.

If this should run on CloudFoundry e.g. http://run.pivotal.io then this means I have a cron scheduler for a single job costing my 35$/Mth (because from Java BuildPack 4.0 JVM Process with only 512MB will not start anymore [2]) - that's an expensive cron definition...

[1] https://github.com/spring-cloud/spring-cloud-stream-app-starters/tree/master/triggertask/spring-cloud-starter-stream-source-triggertask [2] https://www.cloudfoundry.org/just-released-java-buildpack-4-0/

Sabby Anandan
  • 5,636
  • 2
  • 12
  • 21
domi
  • 2,167
  • 1
  • 28
  • 45

1 Answers1

3

TL;DR; Don't do that and either write your own scheduling logic or integrate Spring Cloud Data Flow's REST API with your enterprise scheduler.

The long version
Let me give a bit of history on this and then provide my thoughts on what to do.

When the Spring Cloud Task project was started, we wanted to create a number of sample applications that illustrated task usage in many different use cases. The ability to easily launch a task in reaction to a message being received was one use case we identified to create a sample around. You can see that sample here and here.

When Spring Cloud Data Flow (SCDF) came around, one of the use cases we wanted to be able to address in some way was scheduling tasks. The issue is that we want the SCDF server to be stateless (since it is a cloud native microservice itself). That means that embedding a scheduler isn't an option. From there we felt that integrating with what each platform provided for scheduling made the most sense. However it also required the most work. This approach is actually on our roadmap but we haven't had the user feedback to push that function higher on the list.

The solution we went with to address this requirement in a shorter term was what you find in the documentation today. The re-use of those sample applications in combination with a trigger-task application that handles the cron piece of the puzzle to launch tasks at a given time.

My personal recommendation is that if you don't have a scheduler that you already use, write a Boot app that uses Quartz or some other internal scheduler (Spring Scheduler, etc) to call the SCDF API to launch the tasks at a given schedule. Given the DataFlowTemplate available, writing your own scheduler should be straight forward.

M. Deinum
  • 115,695
  • 22
  • 220
  • 224
Michael Minella
  • 20,843
  • 4
  • 55
  • 67
  • many thanks for these details! So if I understand correctly, in the future there will be a native scheduler integration, in our case from CF in anyway? Is there any issue I can follow? – domi Jul 18 '17 at 17:11
  • 1
    Hi, @Domi. For CF specifically, there's an MVP of CF-Scheduler in the works that will have the ability to schedule and launch Tasks via SCDF's REST-APIs. The idea is to define the DSL in SCDF and use the REST-API in CF-Scheduler to schedule it for a desired date/time or cron. The CF-Scheduler team is roughly targeting the GA release in August. We also have plans to interact with the CF-Scheduler directly (via its APIs and service binding) from SCDF. Our dashboard will have the ability to supply date/time and cron expressions for each Task. – Sabby Anandan Jul 18 '17 at 17:39