4

I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/ (like ./data/01_raw/1.csv, ./data/02_raw/2.csv) etc. I would much rather not give each file a different name and keep track of them individually when building my pipeline. I would like to know if there is any way to read all of them in bulk by specifying something in the catalog.yml file?

Rahul Kumar
  • 2,184
  • 3
  • 24
  • 46
Srikiran
  • 309
  • 1
  • 3
  • 9

1 Answers1

8

You are looking for PartitionedDataSet. In your example, the catalog.yml might look like this:

my_partitioned_dataset:
  type: "PartitionedDataSet"
  path: "data/01_raw"
  dataset: "pandas.CSVDataSet"
schot
  • 10,958
  • 2
  • 46
  • 71
Lim H.
  • 9,870
  • 9
  • 48
  • 74