I have hundreds of CSV files that I want to process similarly. For simplicity, we can assume that they are all in ./data/01_raw/
(like ./data/01_raw/1.csv
, ./data/02_raw/2.csv
) etc. I would much rather not give each file a different name and keep track of them individually when building my pipeline. I would like to know if there is any way to read all of them in bulk by specifying something in the catalog.yml
file?
Asked
Active
Viewed 1,303 times
4

Rahul Kumar
- 2,184
- 3
- 24
- 46

Srikiran
- 309
- 1
- 3
- 9
1 Answers
8
You are looking for PartitionedDataSet. In your example, the catalog.yml
might look like this:
my_partitioned_dataset:
type: "PartitionedDataSet"
path: "data/01_raw"
dataset: "pandas.CSVDataSet"