2

Context: in my country there will be a new way to Instantly Payment previewed for November. Basically, the Central Bank will provide two endpoints: (1) one POST endpoint which we post a single money transfer and (2) one GET endpoint where we get the result of a money transfer sent before and it can be completely out of order. It will answer back only on Money Transfer result and in its header will inform if there is another result we must GET. It never informs how many results are available. If there is a result it gives back on Get response and only inform if it is the last one or there is remaining ones for next GET.

Top limitation: from the moment final user clicks Transfer button in his/her mobile app until final result showing in his mobile screen if it was successful or failed is 10 seconds.

Strategy: I want a schedule which triggers each second or even less than a second a Get to Central Bank. The Scheduler will basically evoke a simple function which

  1. Calls the Get endpoint
  2. Pushes it to a Kafka or persist in database and
  3. If in the answer headers it is informed more results are available, start same function again.

Issue: Since we are Spring users/followers, I though my decision was between Spring Batch versus org.springframework.scheduling.annotation.SchedulingConfigurer/TaskScheduler. I have used successfully Spring Batch for while but never for a so short period trigger (never used for 1 second period). I stumbled in discussion that drove me to think if in my case, a very simple task but with very short period, I should consider Spring Cloud Data Flow or Spring Cloud Task instead of Spring Batch.

According to this answer "... Spring Batch is ... designed for the building of complex compute problems ... You can orchestrate Spring Batch jobs with Spring Scheduler if you want". Based on that, it seems I shouldn't use Spring Batch because it isn't complex my case. The challenge design decision is more regard a short period trigger and triggering another batch from current batch instead of transformation, calculation or ETL process. Nevertheless, as far as I can see Spring Batch with its tasklet is well-designed for restarting, resuming and retrying and fits well a scenario which never finishes while org.springframework.scheduling seems to be only a way to trigger an event based on period configuration. Well, this is my filling based on personal uses and studies.

According to an answer to someone asking about orchestration for composed tasks this answer "... you can achieve your design goals using Spring Cloud Data Flow along with the Spring Cloud Task/Spring Batch...". In my case, I don't see composed tasks. In my case, the second trigger doesn't depend on result from previous one. It sounds more as "chained" tasks instead of "composed". I have never used Spring Cloud Data Flow but it seems a nice candidate for Manage/View/Console/Dashboards the triggered task. Nevertheless, I didn't find anywhere informing limitations or rule of thumbs for short periods triggers and "chained" triggers.

So my straight question is: what is the current recommend Spring members for a so short period trigger? Assuming Spring Cloud Data Flow is used for manager/dashboard what is the trigger member from Spring recommended in so short trigger scenarios? It seems Spring Cloud Task is designed for calling complex functions and Spring Batch seems to add too much than I need and org.springframework.scheduling.* missing integration with Spring Cloud Data Flow. As an analogy and not as comparison, in AWS, the documentation clear says "don't use CloudWatch for less than one minute. If you want less than one minute, start CloudWatch for each minute that start another scheduler/cron each second". There might be a well-know rule of thumb for a simple task that needs to be trigger each second or even less than one second and take advantage of Spring family approach/concerns/experience.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Jim C
  • 3,957
  • 25
  • 85
  • 162
  • 1
    Hello, Spring Cloud Task is a project that allows ephemeral boot apps work well in a cloud env. One of its purposes is to record the exit codes of your boot app, which would be helpful to you. Spring Cloud Data Flow does offer scheduling down to the second using Kubernetes CronJobs or PCF Scheduler on Cloud Foundry. So this maybe helpful to you. If you need scheduling at a sub second level you can utilize another scheduling framework and have it issue RESTful api calls to Spring Cloud Data Flow to launch its tasks. Thus you can track the executions of your tasks and relaunch if necessary. – Glenn Renfro Jul 20 '20 at 13:42
  • @GlennRenfro, thanks. Our Microservices run on HedHat OpenShift. I guess I can assume Spring Cloud Data Flow goes well with Kubernetes CronJobs on OpenShift, right? I can't find anyone using for schedullers that trigger each second. Do you see any naive or odd idea planning to use Spring Cloud Data Flow + Kubernetes CronJob for a infinite batch been triggered each second? Such batch only get an Central Bank end point and save its response body in a database. – Jim C Jul 20 '20 at 14:53
  • Do you see any smell of bad practice or fullish idea? Well, since it is quite new feature in my country I can't really predict all scenarios but certainly some more experienced Architect has some idea if Spring Cloud Data Flow + Kubernetes CronJobs is also aimed for 1 second interval. If it has been used succesfully in "1 second interval" cases around the global or designed for that also, than I am in right path. Even if I face some surprises there will be some out-of-box way to deal with. – Jim C Jul 20 '20 at 14:56
  • 1
    Yes SCDF does work well with cronjobs. To the second question I made a mistake. Cronjobs don't go to the second on Kubernetes, but rather to the minute( I was thinking minutes when I wrote the previous comment). So that will not be a solution for you. You will probably need to have a scheduler outside of dataflow making restful calls to dataflow to launch the tasks at sub second or second level. To the 3rd question. I've seen companies launch tens of thousands of batch/tasks a day. But scaling to that level depends on implementation. – Glenn Renfro Jul 21 '20 at 14:24

1 Answers1

4

This may be stupid answer. Why do you need scheduler here?. Wouldn't a never ending job will achieve the goal here?

  • You start a job, it does a GET request, push the result to kafka,
  • If the GET response indicated, it had more results, it immediately does a GET again, push the result to kafka
  • If the GET response indicated, there are no more results, sleep for 1 second, do the GET request again.
  • Kindly, imagine you have to provide such JOB with such stack: Any Spring framework on OpenShift. How would you design such "never ending job"? Well, you have to think about monitor, reassume in case of exceptions, sleep 1 second, deal with possible Get timeouts and so on. I have used Spring Batch Jobs before but never for so intense GET calls with so short interval. Well, in "never ending job" I can say that there is not interval at all. Would you firstly consider Spring Batch or Spring Cloud Task assuming I don't want to use Scheduller. Is Spring Data C. Flow worth for an infinite job? – Jim C Jul 21 '20 at 19:52
  • 2
    Well. To be honest, I haven't used Opneshift. But for my answer, I assume some framework will start my application. From the moment, it started it will do the above 3 steps. Incase, some error happens and my app shutdown, all I need is framework that restarts my application. I guess there will a mechanism in openshift, that will restart your app if it failed – Kavithakaran Kanapathippillai Jul 21 '20 at 20:06
  • 1
    It's a simple answer, not a stupid one. Anyone can write a task that loops indefinitely, returning items or sleeping for a second. The difficulty comes with making it robust, scaling it horizontally for performance and resilience, and making sure it gets restarted when it falls over and monitored while it's running. Kubernetes springs to mind... – nullTerminator Jul 27 '20 at 21:30