I am reading about "parameters" here and wondering whether I can define catalogue level parameters that I can later use in the definition of the catalogue's sources?
Consider a simple YAML-catalogue with two sources:
sources:
data1:
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data1.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
data2:
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data2.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
Note that both data sources (data1 and data2) make use of snapshot_date
parameter inside urlpath
argument? With this definition I can load data sources with:
cat = intake.open_catalog("./catalog.yaml")
cat.data1(snapshot_date="latest").read() # reads from data/latest/data1.csv
cat.data2(snapshot_date="20211029").read() # reads from data/20211029/data2.csv
Please note that cat.data1().read()
will not work, since snapshot_date
defaults to empty string, so the csv driver cannot find the path "./data//data1.csv".
I can set the default value by adding parameters
section to every (!) source like in the below.
sources:
data1:
parameters:
snapshot_date:
type: str
default: "latest"
description: ""
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data1.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
data2:
parameters:
snapshot_date:
type: str
default: "latest"
description: ""
args:
urlpath: "{{CATALOG_DIR}}/data/{{snapshot_date}}/data2.csv"
description: ''
driver: intake.source.csv.CSVSource
metadata: {}
But this looks complicated (too much repetitive code) and a little inconvenient for the end user -- if a user wants to load all data sources from a given date, he has to explicitly provide snapshot_date
parameter to every(!) data source at initialization. IMO, it would be nice I user can provide this value once when initializing the catalog.
Is there a way I can define snapshot_date
parameter at catalog level? So that:
- I can set default value (e.g. "latest" in my example) in the YAML-definition of the catalogue's parameter
- or can pass catalogue's parameter value at runtimeduring the call
intake.open_catalog("./catalog.yaml", snapshot_date="20211029")
- this value should be accessible in the definition of data sources of this catalog ?
cat = intake.open_catalog("./catalog.yaml", snapshot_date="20211029")
cat.data1.read() # will return data from ./data/20211029/data1.csv
cat.data2.read() # will return data from ./data/20211029/data2.csv
cat.data2(snapshot_date="latest").read() # will return data from ./data/latest/data1.csv
cat = intake.open_catalog("./catalog.yaml")
cat.data1.read() # will return data from ./data/latest/data1.csv
cat.data2.read() # will return data from ./data/latest/data2.csv
Thanks in advance