We are using Google Dataflow for batch data processing and looking for some options for workflow orchestration tools something similar to what Azkaban does for Hadoop.
Key things things that we are looking for are,
- Configuring workflows
- Scheduling workflows
- Monitoring and alerting failed workflows
- Ability to rerun failed jobs
We have evaluated Pentaho, but these features are available in their Enterprise edition which is expensive. We are currently evaluating Azkaban as it supports javaprocess job types. But Azkaban is primarily created for Hadoop jobs so it has more deep integration with Hadoop infrastructure then plain javaprocesses.
Appreciate some suggestions for opensource or very low cost solutions.