I have a big pipeline, taking a few hours to run. A small part of it needs to run quite often, how do I run it without triggering the entire pipeline?
2 Answers
There are multiple ways to specify which nodes or parts of your pipeline to run.
Use
kedro run
parameters like--to-nodes
/--from-nodes
/--node
to explicitly define what needs to be run.In
kedro>=0.15.2
you can define multiple pipelines, and then run only one of them withkedro run --pipeline <name>
. If no--pipeline
parameter is specified, the default pipeline is run. The default pipeline might combine several other pipelines. More information about using modular pipelines: https://kedro.readthedocs.io/en/latest/04_user_guide/06_pipelines.html#modular-pipelinesUse tags. Tag a small portion of your pipeline with something like "small", and then do
kedro run --tag small
. Read more here: https://kedro.readthedocs.io/en/latest/04_user_guide/05_nodes.html#tagging-nodes

- 136
- 1
- 6
-
+1 We use tags most often for this type of work. Thanks for sharing the modular-pipelines link. This is a feature that we have yet to really explore. – Waylon Walker Dec 02 '19 at 04:36
I would reccomend getting your tags or piplines setup to run correctly from the cli as @idanov suggested. It will be much easier for you in the long run moving to production. I would also add that you can do quite a bit of ad hoc pipeline trimming and running inside of python, here are some examples.
filter by tags
nodes = pipeline.only_nodes_with_tags('cars')
filter by node
nodes = pipeline.only_nodes('b_int_cars')
filter nodes like
query_string = 'cars'
nodes = [
node.name
for node in pipeline.nodes
if query_string in node.name
]
pipeline.only_nodes(*nodes)
only nodes with tags or
nodes = pipeline.only_nodes_with_tags('cars', 'trains')
only nodes with tags and
raw_nodes = pipeline.only_nodes_with_tags('raw')
car_nodes = pipeline.only_nodes_with_tags('cars')
raw_car_nodes = raw_nodes & car_nodes
raw_nodes = (
pipeline
.only_nodes_with_tags('raw')
.only_nodes_with_tags('cars')
)
add pipelines
car_nodes = pipeline.only_nodes_with_tags('cars')
train_nodes = pipeline.only_nodes_with_tags('trains')
transportation_nodes = car_nodes + train_nodes
The above was a snippet from my personal kedro notes.

- 543
- 3
- 10