14

What is difference between Oozie workflow, coordinator and bundle ?

Oozie workflow defines a sequence of actions. And we need to invoke it manually every time we want it to run. Where as same workflow can be scheduled through coordinator. Is this understanding correct ?

Then what is extra in bundle ?

I guess it is used again to schedule set of coordinators. Then why can't one coordinator be used to schedule other coordinator like one workflow can have another sub-workflow.

Kaushik Lele
  • 6,439
  • 13
  • 50
  • 76
  • 2
    If coordinator One is scheduled at 7 am & coordinator two is scheduled at 10 am When we bundle these 2 together 1) Do we need to schedule Bundle as well ? 2) If coordinator One fails or delayed beyond 10 am, will Bundle stop Coordinator Two from executing? Could you please clarify. – chandra Apr 19 '16 at 18:03

3 Answers3

11

Workflow:

It is a sequence of actions. It is written in xml and the actions can be map reduce, hive, pig etc.

Coordinator:

It is a program that triggers actions (commonly workflow jobs) when a set of conditions are met. Conditions can be a time frequency,other external events etc.

Bundle:

It is defined as a higher level oozie abstraction that batches a set of coordinator jobs.We can specify the time for bundle job to start as well.

madhu
  • 1,140
  • 8
  • 14
  • thanks for answer. But these definitions still do not clarify the difference/need of Bundle compared to Coordinator. – Kaushik Lele Oct 24 '15 at 05:52
  • 2
    Just an higher level of abstraction... Group of workflows coordinator... Group of coordinators bundle.... – madhu Oct 24 '15 at 16:28
2

Workflow does not have time specifications to run any hadoop job. Coordinator job have the time specifications about job in coordinator.xml using frequency tag. Collective coordinator jobs are considered to be as a Bundle job. In Bundle job, individual users can assign their own jobs by using their job.properties, for their respective jobs.

0

For my understanding, using bundle could group a couple of coordinators, so it will be better to manager, to view, to start/stop...

Likely we have two data pipeline, one is for log handing(collect/parse/ETL), one is for business logic.

Then I create two bundles to groups the different kinds of coordinators.

Robin Wang
  • 779
  • 1
  • 8
  • 16