Kedro is an open source Python library that helps you build production-ready data and analytics pipelines
Questions tagged [kedro]
202 questions
7
votes
1 answer
How to run the nodes in sequence as declared in kedro pipeline?
In Kedro pipeline, nodes (something like python functions) are declared sequentially. In some cases, the input of one node is the output of the previous node. However, sometimes, when kedro run API is called in the commandline, the nodes are not run…

Baenka
- 243
- 3
- 15
6
votes
1 answer
How to process huge datasets in kedro
I have pretty big (~200Gb, ~20M lines) raw jsonl dataset. I need to extract important properties from there and store the intermediate dataset in csv for further conversion into something like HDF5, parquet, etc. Obviously, I can't use JSONDataSet…

eawer
- 1,398
- 3
- 13
- 25
5
votes
1 answer
DataBricks + Kedro Vs GCP + Kubeflow Vs Server + Kedro + Airflow
We are deploying a data consortium between more than 10 companies. Wi will deploy several machine learning models (in general advanced analytics models) for all the companies and we will administrate all the models. We are looking for a solution…

Erick Translateur
- 69
- 3
5
votes
3 answers
Kedro deployment to databricks
Maybe I misunderstand the purpose of packaging but it doesn't seem to helpful in creating an artifact for production deployment because it only packages code. It leaves out the conf, data, and other directories that make the kedro project…

dres
- 1,172
- 11
- 15
5
votes
2 answers
How to run parts of your Kedro pipeline conditionally?
I have a big pipeline, taking a few hours to run. A small part of it needs to run quite often, how do I run it without triggering the entire pipeline?

idanov
- 136
- 1
- 6
4
votes
1 answer
Running a kedro pipeline with inputs and outputs defined through the command line
I would like to run a kedro pipeline using different inputs and saving the results in an output folder where inputs paths and outputs paths are provided through the command line
I sow the possibility of using the kedro.config.TemplatedConfigLoader…

Isy89
- 179
- 8
4
votes
2 answers
How to load a specific catalog dataset instance in kedro 0.17.0?
We were using kedro version 0.15.8 and we were loading one specific item from the catalog this way:
from kedro.context import load_context
get_context().catalog.datasets.__dict__[key]
Now, we are changing to kedro 0.17.0 and trying to load the…

Javi Hernandez
- 314
- 8
- 17
4
votes
2 answers
Kedro install - Cannot uninstall `terminado`
When running kedro install I get the following error:
Attempting uninstall: terminado
Found existing installation: terminado 0.8.3
ERROR: Cannot uninstall 'terminado'. It is a distutils installed project and thus we cannot accurately determine…

zeh
- 1,197
- 2
- 14
- 29
4
votes
2 answers
Override nested parameters using kedro run CLI command
I am using nested parameters in my parameters.yml and would like to override these using runtime parameters for the kedro run CLI command:
train:
batch_size: 32
train_ratio: 0.9
epochs: 5
The following doesn't seem to work:
kedro run…

evolved
- 1,850
- 19
- 40
4
votes
1 answer
How do I add many CSV files to the catalog in Kedro?
I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/ (like ./data/01_raw/1.csv, ./data/02_raw/2.csv) etc. I would much rather not give each file a different name and keep…

Srikiran
- 309
- 1
- 3
- 9
4
votes
1 answer
How to write a list of dataframes into multiple sheets of ExcelLocalDataSet?
The input is a list of dataframes. How can I save it into an ExcelLocalDataSet where each dataframe is a separate sheet?

James Wong
- 45
- 1
- 7
4
votes
2 answers
Pipeline can't find nodes in kedro
I was following pipelines tutorial, create all needed files, started the kedro with kedro run --node=preprocessing_data but got stuck with such error message:
ValueError: Pipeline does not contain nodes named ['preprocessing_data'].
If I run kedro…

eawer
- 1,398
- 3
- 13
- 25
4
votes
1 answer
Setting parameters in Kedro Notebook
Is it possible to overwrite properties taken from the parameters.yaml file within a Kedro notebook?
I am trying to dynamically change parameter values within a notebook. I would like to be able to give users the ability to run a standard pipeline…

DHollett
- 43
- 2
4
votes
1 answer
Kedro with MongoDB and other document databases?
What's the best practice for using kedro with MongoDB or other document databases? MongoDB, for example, doesn't have a query language analogous to SQL. Most Mongo "queries" in Python (using PyMongo) will look something like this:
from pymongo…

Benjamin Jack
- 83
- 1
- 5
4
votes
1 answer
Kedro: How to pass multiple same data from a directory as a node input?
I have a directory with multiple files for the same data format (1 file per day). It's like one data split into multiple files.
Is it possible to pass all the files to A Kedro node without specifying each file? So they all get processed…

921kiyo
- 584
- 4
- 14