0

I would like to load a YAML file and create a Pydantic BaseModel object. I would like to know if it is possible to reuse a variable inside the YAML file, for example:

YAML file

config:
  variables:
    root_level: DEBUG
    my_var: "TEST"

  handlers_logs:
    - class: $my_var #<--- here
      level_threshold: STATS
      block_level_filter: true
      disable: false
      args:
        hosts: $my_var #<--- here
        topic: _stats

My code:

import os
from pprint import pprint

import yaml
from pydantic import BaseModel
from typing import Dict
from typing import Optional
from yaml.parser import ParserError

class BaseLogModel(BaseModel):
    class Config:
        use_enum_values = True
        allow_population_by_field_name = True


class Config(BaseLogModel):
    variables: Optional[Dict[str, str]]
    handlers_logs: Any


def load_config(filename) -> Optional[Config]:
    if not os.path.exists(filename):
        return None

    with open(filename) as f:
        try:
            config_file = yaml.load(f.read(), Loader=yaml.SafeLoader)
            if config_file is not None and isinstance(config_file, dict):
                config_data = config_file["config"]
            else:
                return None
        except ParserError as e:
            return None

    return Config.parse_obj(config_data)


def main():
    config = load_config("config.yml")
    pprint(config)

Output:

Config(variables={'root_level': 'DEBUG', 'my_var': 'TEST'}, handlers_logs=[{'class': '$my_var', 'level_threshold': 'STATS', 'block_level_filter': True, 'disable': False, 'args': {'hosts': '$my_var', 'topic': '_stats'}}])

Instead of the variable $my_var I would like there to be "TEST", this way I wouldn't need to rewrite the same value every time. Is it possible to do this with Pydantic or some other YAML library?

Daniil Fajnberg
  • 12,753
  • 2
  • 10
  • 41
Plaoo
  • 417
  • 3
  • 19

1 Answers1

1

The YAML specification provides anchors (introduced with a &) and aliases (referenced with a *) to reuse nodes. So you could write the following:

# config.yaml

variables:
  root_level: DEBUG
  my_var: &my_var "TEST"  # <-- anchored node
handlers_logs:
  - class: *my_var  # <-- alias
    level_threshold: STATS
    block_level_filter: true
    disable: false
    args:
      hosts: *my_var  # <-- alias
      topic: _stats

Regarding the Pydantic side of things, since it looks a lot like you are parsing a config/settings object, I would recommend using Pydantic's BaseSettings class and its capabilities specifically for that.

I often use YAML config files myself and the pattern I use for deserializing them to a settings object usually looks something like this:

from typing import Any
from pathlib import Path

from pydantic import BaseModel as PydanticBaseModel, BaseSettings as PydanticBaseSettings
from pydantic.env_settings import SettingsSourceCallable
from pydantic.utils import deep_update
from yaml import safe_load

THIS_DIR = Path(__file__).parent


class BaseModel(PydanticBaseModel):
    class Config:
        use_enum_values = True
        allow_population_by_field_name = True


class BaseSettings(PydanticBaseSettings, BaseModel):
    class Config:
        config_files = [
            Path(THIS_DIR, "config.yaml"),  # example file in the same directory
        ]

        @classmethod
        def customise_sources(
                cls,
                init_settings: SettingsSourceCallable,
                env_settings: SettingsSourceCallable,
                file_secret_settings: SettingsSourceCallable
        ) -> tuple[SettingsSourceCallable, ...]:
            return init_settings, env_settings, config_file_settings


def config_file_settings(settings: PydanticBaseSettings) -> dict[str, Any]:
    config: dict[str, Any] = {}
    if not isinstance(settings, BaseSettings):
        return config
    for path in settings.Config.config_files:
        full_path = path.resolve()
        if not path.is_file():
            print(f"No file found at `{full_path}`")
            continue
        print(f"Reading config file `{full_path}`")
        if path.suffix in {".yaml", ".yml"}:
            config = deep_update(config, load_yaml(full_path))
        else:
            print(f"Unknown config file extension `{path.suffix}`")
    return config


def load_yaml(path: Path) -> dict[str, Any]:
    with Path(path).open("r") as f:
        config = safe_load(f)
    if not isinstance(config, dict):
        raise TypeError(f"Config file has no top-level mapping: {path}")
    return config

This allows separating different configurations into multiple config files that are loaded sequentially, where those loaded later override previously loaded config sections. You just need to list their paths in the Config.config_files iterable.

Inferring the desired settings schema from your YAML example, I would then write the model like this:

from pydantic import Field

# ... import the custom BaseModel and BaseSettings


class Variables(BaseModel):
    root_level: str
    my_var: str


class HandlersLog(BaseModel):
    class_: str = Field(alias="class")
    level_threshold: str
    block_level_filter: bool
    disable: bool
    args: dict[str, str]


class Config(BaseSettings):
    variables: Variables
    handlers_logs: list[HandlersLog]


if __name__ == "__main__":
    settings = Config()
    print(settings.json(indent=4, by_alias=True))

With the config.yaml from above in the same directory as this source file, the output is the following:

{
    "variables": {
        "root_level": "DEBUG",
        "my_var": "TEST"
    },
    "handlers_logs": [
        {
            "class": "TEST",
            "level_threshold": "STATS",
            "block_level_filter": true,
            "disable": false,
            "args": {
                "hosts": "TEST",
                "topic": "_stats"
            }
        }
    ]
}
Daniil Fajnberg
  • 12,753
  • 2
  • 10
  • 41