1

I have the following code, using the hydra framework

# dummy_hydra.py

from dataclasses import dataclass

import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import DictConfig, OmegaConf


@dataclass
class Foo:
    x: int = 0
    y: int = 1


@dataclass
class Bar:
    a: int = 0
    b: int = 1


@dataclass
class FooBar:
    foo: Foo
    bar: Bar


cs = ConfigStore.instance()
cs.store(name="config_schema", node=FooBar)


@hydra.main(config_name="dummy_config", config_path=".", version_base=None)
def main(config: DictConfig):
    config_obj: FooBar = OmegaConf.to_object(config)
    print(config_obj)


if __name__ == '__main__':
    main()

(This is a simplified code of my actual use case, of course)

As you can see, I have a nested dataclass - the FooBar class contains instances of Foo and Bar. Both Foo and Bar have default attribute values. Hence, I thought I can define a yaml file that does not necessarily initializes Foo and/or Bar. Here's the file I use:

# dummy_config.yaml
defaults:
  - config_schema
  - _self_

foo:
  x: 123
  y: 456

When I run this code, surprisingly (?) it does not initialize Bar (which is not mentioned in the yaml config file), but throws an error:

omegaconf.errors.MissingMandatoryValue: Structured config of type `FooBar` has missing mandatory value: bar
    full_key: bar
    object_type=FooBar

What's the proper way to use this class structure such that I don't need to explicitly initialize classes with non-mandatory fields (such as Bar)?

noamgot
  • 3,962
  • 4
  • 24
  • 44

2 Answers2

2

The FooBar class has no default for either the foo or bar attributes, this is my guess as why you are seeing that error.

You could provide a default using the default_factory:

from dataclasses import field

...

@dataclass
class FooBar:
    foo: Foo = field(default_factory=Foo)
    bar: Bar = field(default_factory=Bar)

...
Matteo Zanoni
  • 3,429
  • 9
  • 27
  • 1
    Useful to note that `foo: Foo = Foo()` does not work because `Foo`instances are mutable and this would only create one (then shared) instance. – user2390182 Jul 05 '23 at 13:16
  • @user2390182 you are absolutelly right! This i why I suggested usage of `default_factory` – Matteo Zanoni Jul 05 '23 at 13:30
  • 1
    Yup, the other one (my example) would be rejected at class definition time. So you have to do it your way- The OP might just wonder why the syntax as in his previous `x: int = 0` won't work here. – user2390182 Jul 05 '23 at 13:33
  • Thanks @MatteoZanoni ! @user2390182 Despite what you say, setting `foo: Foo = Foo()` (and same for `bar`) does work. Did I miss something? – noamgot Jul 05 '23 at 15:18
  • @noamgot Strange, I get an Error, maybe you have stripped your example down too much. But according to my tests and [this thread](https://stackoverflow.com/questions/53632152/why-cant-dataclasses-have-mutable-defaults-in-their-class-attributes-declaratio), the interpreter should complain. Apart from that, be careful with that, because the default argument is evaluated only once at definition time. That means if you have `fb1 = FooBar(); fb2 = FooBar()`, these two are **not** independent. They both reference identical `Foo`/`Bar` objects. `fb1.foo.x = 5` would affect `fb2`as well. – user2390182 Jul 05 '23 at 17:59
  • @user2390182 I used this specific example :) But I understand your answer, it makes sense (although not relevant here, as I really use just one FooBar object... thanks anyway – noamgot Jul 06 '23 at 15:46
1

Uninitialized values in dataclasses are considered missing. This semantic is unique to OmegaConf (the underlying config library powering Hydra) and accessing those fields will result in the MissingMandatoryValue exception when you access the field. You can use OmegaConf.is_missing(cfg, "bar") to determine if the field is missing without triggering the exception.

In pure YAML config, you can achieve this behavior by using the value ??? in your config file. In Structured Configs (dataclasses) you can achieve it explicitly by assigning OmegaConf.MISSING to a field.

It is not clear from your question what you want in the bar field. If it's None, you can convert change the signature of your dataclass to something like:

@dataclass
class FooBar:
    foo: Optional[Foo] = None
    bar: Optional[Bar] = None

If you want to have foo and bar initialized to their default values, this just assign Foo() and Bar() respectively. I saw in another comment that you are concerned that the instance will be shared. This is not the case. The config is converted to OmegaConf DictConfig in any case before you convert it to an object. Try and see.


@dataclass
class Foo:
    x: int = 0
    y: int = 1


@dataclass
class Bar:
    a: int = 0
    b: int = 1
    f: Foo = Foo()


@dataclass
class FooBar:
    foo: Foo = Foo()
    bar1: Bar = Bar()
    bar2: Bar = Bar()


cs = ConfigStore.instance()
cs.store(name="config_schema", node=FooBar)


@hydra.main(config_name="dummy_config", config_path=".", version_base=None)
def main(config: DictConfig):
    config_obj: FooBar = OmegaConf.to_object(config)
    config_obj.foo.x = 100
    config_obj.bar1.f.x = 200
    config_obj.bar2.f.x = 300
    print(config_obj)
    # FooBar(foo=Foo(x=100, y=456), bar1=Bar(a=0, b=1, f=Foo(x=200, y=1)), bar2=Bar(a=0, b=1, f=Foo(x=300, y=1)))
Omry Yadan
  • 31,280
  • 18
  • 64
  • 87
  • 1
    Thanks! The case is indeed the case where I want `Bar` to be initialized without me specifying the default values in the yaml file. As far as I can see, indeed intializing `bar: Bar = Bar()` does the job (and if I do put some other values in the yaml it takes them, which is good). – noamgot Jul 06 '23 at 15:49