4

I have a directory with multiple files for the same data format (1 file per day). It's like one data split into multiple files.

Is it possible to pass all the files to A Kedro node without specifying each file? So they all get processed sequentially or in parallel based on the runner?

921kiyo
  • 584
  • 4
  • 14

1 Answers1

2
  1. If the number of files is small and fixed, you may consider creating those preprocessing pipeline for each of them manually.
  2. If the number of files is large/dynamic, you may create your pipeline definition programmatically for each of them, adding them all together afterwards. Same would probably apply to programmatic creation of the required datasets.
  3. Alternative option would be to read all the files once in the first node, concatenate them all together into one dataset, and make all consecutive preproc nodes use that dataset (or its derivatives) as inputs
921kiyo
  • 584
  • 4
  • 14
  • 2
    Have you guys considered to setup some "pattern catalog" section or "cookbook" section in the docs or somewhere else? – thinwybk Jun 03 '20 at 07:31