I want to run a pipeline for different files, but some of them don't need all of the defined nodes. How can I pass them?
-
Hi sofiacosta29. Welcome to SO! Your questions seems pretty complete to me in some ways, but can you mention what you tried and why it hasn't worked? If you need help you can look at this suggestion on how to make [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) – bart cubrich Nov 05 '19 at 17:43
-
Thank you! I have tried using tags in the nodes but I would like to know if there is a way of running a pipeline without two nodes, for example. That is, instead of using "kedro run --tag=tag1 --tag=tag2..." with many tags, using something like "kedro run --except=tag3". Is there any way of doing this, please? Thank you for your attention! – sofiacosta29 Nov 06 '19 at 15:11
3 Answers
Would modular pipelines help here? You could build two pipelines, one consisting of just the two "optional" nodes and the other without, then you can return the default pipeline being the sum of the two. Somethign like this:
def create_pipelines(**kwargs):
two_node_pipeline = Pipeline(node(), node())
rest_of_pipeline = Pipeline(node(), node(), node(), node())
return {
"rest_of_pipeline": rest_of_pipeline,
"__default__": two_node_pipeline + rest_of_pipeline,
}
Then you can do kedro run --pipeline rest_of_pipeline
to run the pipeline without those two nodes or kedro run
to run the pipeline with the extra two nodes.
Otherwise, I think if you modify your kedro_cli
or ProjectContext
or run.py
, whatever it is, it should be fairly easy to add in the --except
functionality yourself. I might look into doing this...
Kedro will do the sorting of the nodes automatically, according to toposort, see this previous answer: How to run the nodes in sequence as declared in kedro pipeline?

- 983
- 5
- 14
-
The initial idea was to maintain the original pipeline and use it for multiple files. However, this seems a good option as well, thank you! – sofiacosta29 Nov 06 '19 at 17:07
-
-
Having a built pipeline, being able to choose in one instruction which nodes not to run. For example, having a pipeline with nodes "node1", "node2", ..., "node10", being able to call 9 of those nodes without having to name them all. Ideally, in an instruction similar to "kedro run --except=node5", for example. Thank you very much for your attention! – sofiacosta29 Nov 07 '19 at 14:57
To filter out a few lines of a pipeline you can simply filter the pipeline list from inside of python, my favorite way is to use a list comprehension.
by name
nodes_to_run = [node for node in pipeline.nodes if 'dont_run_me' not in node.name]
run(nodes_to_run, io)
by tag
nodes_to_run = [node for node in pipeline.nodes if 'dont_run_tag' not in node.tags]
run(nodes_to_run, io)
It's possible to filter by any attribute tied to the pipeline node, (name, inputs, outputs, short_name, tags)
If you need to run your pipeline this way in production or from the command line, you can either tag your pipeline to run with tags, or add a custom click.option
to your run
function inside of kedro_cli.py
then run this filter when the flag is True
.
Note
This assumes that you have your pipeline loaded into memory as pipeline
and catalog loaded in as io

- 543
- 3
- 10
You can also use --to-nodes
CLI option: kedro run --to-nodes node1,node2
. Internally this will call pipeline.to_nodes("node1", "node2")
- method docs. Please note that you would still need to identify the "final" list of nodes that have to be run.

- 1,518
- 2
- 14
- 27