Kedro: How to pass multiple same data from a directory as a node input?

Question

I have a directory with multiple files for the same data format (1 file per day). It's like one data split into multiple files.

Is it possible to pass all the files to A Kedro node without specifying each file? So they all get processed sequentially or in parallel based on the runner?

you had answer to your own question in 24 seconds :D why the question then — Yugandhar Chaudhari, Nov 19 '19 at 09:54
I am just sharing a question that's frequently asked, so people can find the answer by googling:) — 921kiyo, Nov 19 '19 at 22:52

score 2 · Answer 1 · answered Nov 19 '19 at 09:52

If the number of files is small and fixed, you may consider creating those preprocessing pipeline for each of them manually.
If the number of files is large/dynamic, you may create your pipeline definition programmatically for each of them, adding them all together afterwards. Same would probably apply to programmatic creation of the required datasets.
Alternative option would be to read all the files once in the first node, concatenate them all together into one dataset, and make all consecutive preproc nodes use that dataset (or its derivatives) as inputs

Have you guys considered to setup some "pattern catalog" section or "cookbook" section in the docs or somewhere else? — thinwybk, Jun 03 '20 at 07:31

Kedro: How to pass multiple same data from a directory as a node input?

1 Answers1