Questions tagged [kedro]

Kedro is an open source Python library that helps you build production-ready data and analytics pipelines

202 questions
7
votes
1 answer

How to run the nodes in sequence as declared in kedro pipeline?

In Kedro pipeline, nodes (something like python functions) are declared sequentially. In some cases, the input of one node is the output of the previous node. However, sometimes, when kedro run API is called in the commandline, the nodes are not run…
Baenka
  • 243
  • 3
  • 15
6
votes
1 answer

How to process huge datasets in kedro

I have pretty big (~200Gb, ~20M lines) raw jsonl dataset. I need to extract important properties from there and store the intermediate dataset in csv for further conversion into something like HDF5, parquet, etc. Obviously, I can't use JSONDataSet…
eawer
  • 1,398
  • 3
  • 13
  • 25
5
votes
1 answer

DataBricks + Kedro Vs GCP + Kubeflow Vs Server + Kedro + Airflow

We are deploying a data consortium between more than 10 companies. Wi will deploy several machine learning models (in general advanced analytics models) for all the companies and we will administrate all the models. We are looking for a solution…
5
votes
3 answers

Kedro deployment to databricks

Maybe I misunderstand the purpose of packaging but it doesn't seem to helpful in creating an artifact for production deployment because it only packages code. It leaves out the conf, data, and other directories that make the kedro project…
dres
  • 1,172
  • 11
  • 15
5
votes
2 answers

How to run parts of your Kedro pipeline conditionally?

I have a big pipeline, taking a few hours to run. A small part of it needs to run quite often, how do I run it without triggering the entire pipeline?
idanov
  • 136
  • 1
  • 6
4
votes
1 answer

Running a kedro pipeline with inputs and outputs defined through the command line

I would like to run a kedro pipeline using different inputs and saving the results in an output folder where inputs paths and outputs paths are provided through the command line I sow the possibility of using the kedro.config.TemplatedConfigLoader…
Isy89
  • 179
  • 8
4
votes
2 answers

How to load a specific catalog dataset instance in kedro 0.17.0?

We were using kedro version 0.15.8 and we were loading one specific item from the catalog this way: from kedro.context import load_context get_context().catalog.datasets.__dict__[key] Now, we are changing to kedro 0.17.0 and trying to load the…
Javi Hernandez
  • 314
  • 8
  • 17
4
votes
2 answers

Kedro install - Cannot uninstall `terminado`

When running kedro install I get the following error: Attempting uninstall: terminado Found existing installation: terminado 0.8.3 ERROR: Cannot uninstall 'terminado'. It is a distutils installed project and thus we cannot accurately determine…
zeh
  • 1,197
  • 2
  • 14
  • 29
4
votes
2 answers

Override nested parameters using kedro run CLI command

I am using nested parameters in my parameters.yml and would like to override these using runtime parameters for the kedro run CLI command: train: batch_size: 32 train_ratio: 0.9 epochs: 5 The following doesn't seem to work: kedro run…
evolved
  • 1,850
  • 19
  • 40
4
votes
1 answer

How do I add many CSV files to the catalog in Kedro?

I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/ (like ./data/01_raw/1.csv, ./data/02_raw/2.csv) etc. I would much rather not give each file a different name and keep…
Srikiran
  • 309
  • 1
  • 3
  • 9
4
votes
1 answer

How to write a list of dataframes into multiple sheets of ExcelLocalDataSet?

The input is a list of dataframes. How can I save it into an ExcelLocalDataSet where each dataframe is a separate sheet?
James Wong
  • 45
  • 1
  • 7
4
votes
2 answers

Pipeline can't find nodes in kedro

I was following pipelines tutorial, create all needed files, started the kedro with kedro run --node=preprocessing_data but got stuck with such error message: ValueError: Pipeline does not contain nodes named ['preprocessing_data']. If I run kedro…
eawer
  • 1,398
  • 3
  • 13
  • 25
4
votes
1 answer

Setting parameters in Kedro Notebook

Is it possible to overwrite properties taken from the parameters.yaml file within a Kedro notebook? I am trying to dynamically change parameter values within a notebook. I would like to be able to give users the ability to run a standard pipeline…
DHollett
  • 43
  • 2
4
votes
1 answer

Kedro with MongoDB and other document databases?

What's the best practice for using kedro with MongoDB or other document databases? MongoDB, for example, doesn't have a query language analogous to SQL. Most Mongo "queries" in Python (using PyMongo) will look something like this: from pymongo…
Benjamin Jack
  • 83
  • 1
  • 5
4
votes
1 answer

Kedro: How to pass multiple same data from a directory as a node input?

I have a directory with multiple files for the same data format (1 file per day). It's like one data split into multiple files. Is it possible to pass all the files to A Kedro node without specifying each file? So they all get processed…
921kiyo
  • 584
  • 4
  • 14
1
2 3
13 14