38

Given a dataclass like below:

class MessageHeader(BaseModel):
    message_id: uuid.UUID

    def dict(self, **kwargs):
        return json.loads(self.json())

I would like to get a dictionary of string literal when I call dict on MessageHeader The desired outcome of dictionary is like below:

{'message_id': '383b0bfc-743e-4738-8361-27e6a0753b5a'}

I want to avoid using 3rd party library like pydantic & I do not want to use json.loads(self.json()) as there are extra round trips

Is there any better way to convert a dataclass to a dictionary with string literal like above?

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
Unknown
  • 778
  • 1
  • 7
  • 16

5 Answers5

61

You can use dataclasses.asdict:

from dataclasses import dataclass, asdict

class MessageHeader(BaseModel):
    message_id: uuid.UUID

    def dict(self):
        return {k: str(v) for k, v in asdict(self).items()}

If you're sure that your class only has string values, you can skip the dictionary comprehension entirely:

class MessageHeader(BaseModel):
    message_id: uuid.UUID

    dict = asdict
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
26
asdict(instance)

OMG

Phenomenal naming from the Python core developers.

Not .as_dict(), not .to_dict(), not dict(instance).

from dataclasses import asdict 

fanni
  • 1,149
  • 8
  • 11
  • 3
    Yes, another excellent choice like many others in the Python standard library. ‍♂️ – ArKan May 26 '23 at 23:35
  • 1
    *should* this be an answer? and also, *should* this deserve to get so many upvotes as it has? I don't understand as it isn't really answering the question that the OP had at all. It's just an observation. For that reason, I feel this would be better off as a comment on the question rather than as a separate answer, which doesn't really contribute anything original other than a thought that most of us have secretly had within our heads (but truthfully never voiced it as such). – rv.kvetch Jul 31 '23 at 16:02
  • @rv.kvetch it was cry from the heart – fanni Aug 03 '23 at 10:43
  • asdict... I see where this is going – PhillipJacobs Aug 10 '23 at 14:28
11

For absolute pure, unadulterated speed and boundless efficiency, the kinds of which could even cause the likes of Chuck Norris to take pause and helplessly look on in awe, I humbly recommend this remarkably well planned-out approach with __dict__:

def dict(self):
    _dict = self.__dict__.copy()
    _dict['message_id'] = str(_dict['message_id'])
    return _dict

For a class that defines a __slots__ attribute, such as with @dataclass(slots=True), the above approach most likely won't work, as the __dict__ attribute won't be available on class instances. In that case, a highly efficient "shoot for the moon" approach such as below could instead be viable:

def dict(self):
    body_lines = ','.join(f"'{f}':" + (f'str(self.{f})' if f == 'message_id'
                                       else f'self.{f}') for f in self.__slots__)
    # Compute the text of the entire function.
    txt = f'def dict(self):\n return {{{body_lines}}}'
    ns = {}
    exec(txt, locals(), ns)
    _dict_fn = self.__class__.dict = ns['dict']
    return _dict_fn(self)

In case anyone's teetering at the edge of their seats right now (I know, this is really incredible, breakthrough-level stuff) - I've added my personal timings via the timeit module below, that should hopefully shed a little more light in the performance aspect of things.

FYI, the approaches with pure __dict__ are inevitably much faster than dataclasses.asdict().

Note: Even though __dict__ works better in this particular case, dataclasses.asdict() will likely be better for composite dictionaries, such as ones with nested dataclasses, or values with mutable types such as dict or list.

from dataclasses import dataclass, asdict, field
from uuid import UUID, uuid4


class DictMixin:
    """Mixin class to add a `dict()` method on classes that define a __slots__ attribute"""

    def dict(self):
        body_lines = ','.join(f"'{f}':" + (f'str(self.{f})' if f == 'message_id'
                                           else f'self.{f}') for f in self.__slots__)
        # Compute the text of the entire function.
        txt = f'def dict(self):\n return {{{body_lines}}}'
        ns = {}
        exec(txt, locals(), ns)
        _dict_fn = self.__class__.dict = ns['dict']
        return _dict_fn(self)


@dataclass
class MessageHeader:
    message_id: UUID = field(default_factory=uuid4)
    string: str = 'a string'
    integer: int = 1000
    floating: float = 1.0

    def dict1(self):
        _dict = self.__dict__.copy()
        _dict['message_id'] = str(_dict['message_id'])
        return _dict

    def dict2(self):
        return {k: str(v) if k == 'message_id' else v
                for k, v in self.__dict__.items()}

    def dict3(self):
        return {k: str(v) if k == 'message_id' else v
                for k, v in asdict(self).items()}


@dataclass(slots=True)
class MessageHeaderWithSlots(DictMixin):
    message_id: UUID = field(default_factory=uuid4)
    string: str = 'a string'
    integer: int = 1000
    floating: float = 1.0

    def dict2(self):
        return {k: str(v) if k == 'message_id' else v
                for k, v in asdict(self).items()}


if __name__ == '__main__':
    from timeit import timeit

    header = MessageHeader()
    header_with_slots = MessageHeaderWithSlots()

    n = 10000
    print('dict1():  ', timeit('header.dict1()', number=n, globals=globals()))
    print('dict2():  ', timeit('header.dict2()', number=n, globals=globals()))
    print('dict3():  ', timeit('header.dict3()', number=n, globals=globals()))

    print('slots -> dict():  ', timeit('header_with_slots.dict()', number=n, globals=globals()))
    print('slots -> dict2(): ', timeit('header_with_slots.dict2()', number=n, globals=globals()))

    print()

    dict__ = header.dict1()
    print(dict__)

    asdict__ = header.dict3()
    print(asdict__)

    assert isinstance(dict__['message_id'], str)
    assert isinstance(dict__['integer'], int)

    assert header.dict1() == header.dict2() == header.dict3()
    assert header_with_slots.dict() == header_with_slots.dict2()

Results on my Mac M1 laptop:

dict1():   0.005992999998852611
dict2():   0.00800508284009993
dict3():   0.07069579092785716
slots -> dict():   0.00583599996753037
slots -> dict2():  0.07395245810039341

{'message_id': 'b4e17ef9-1a58-4007-9cef-39158b094da2', 'string': 'a string', 'integer': 1000, 'floating': 1.0}
{'message_id': 'b4e17ef9-1a58-4007-9cef-39158b094da2', 'string': 'a string', 'integer': 1000, 'floating': 1.0}

Note: For a more "complete" implementation of DictMixin (named as SerializableMixin), check out a related answer I had also added.

rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • Any idea what `asdict` is doing to slow it down so much? – Karl Knechtel Sep 30 '22 at 03:06
  • 1
    @KarlKnechtel I'm not *entirely* sure, but my money's on the `copy.deepcopy()` call. If you look at the dataclasses source code for `asdict`, you can see it calls `deepcopy` on any complex or unknown type, which in this case would likely be the `UUID` object. – rv.kvetch Sep 30 '22 at 03:09
  • 1
    This is the correct answer. You may add a note that while it works better in this case, asdict will likely be better for composite dictionaries. – Ryan Deschamps Sep 30 '22 at 13:57
  • 1
    @RyanDeschamps done. agreed that was something that should be mentioned at least. – rv.kvetch Sep 30 '22 at 15:19
  • 1
    This won't work with the slots=True dataclass parameter introduced in python 3.10 – G. Ghez Oct 08 '22 at 22:04
  • @G.Ghez good point. I've also updated my answer with a version that works for the `slots=True` approach, as mentioned. – rv.kvetch Oct 09 '22 at 02:02
0

This is a top google result for "dataclass to dict", and the answers above are overly complicated. You're probably looking for this:

from dataclasses import dataclass
@dataclass
class MessageHeader():
    uuid: str = "abcd"
vars(MessageHeader()) # or MessageHeader().__dict__
tbenst
  • 804
  • 8
  • 16
0

Inspired by @rv.kvetch's answer, I wrote this decorator, which will generate the code for an asdict method on the fly based on the class definition. It also supports subclassing, meaning the subclass will inherit superclass' attributes.

Decorator:

import typing


def generate_dict_method(
        __source: typing.Literal["slots", "annotations"],
        __name: str,
        /,
        **custom_mappings: typing.Callable[[typing.Any], typing.Any]
):
    if custom_mappings is None:
        custom_mappings = dict()

    def decorator(cls):
        attributes = set()
        for mc in cls.__mro__:
            if __source == 'annotations':
                attrs = getattr(mc, "__annotations__", None)
                if attrs:
                    attrs = attrs.keys()
            elif __source == "slots":
                attrs = getattr(mc, "__slots__", None)
            else:
                raise NotImplementedError(__source)
            if attrs:
                attributes.update(attrs)

        if not attributes:
            raise RuntimeError(
                f"Unable to generate `{__name}` method for `{cls.__qualname__}` class: "
                "no attributes found."
            )

        funclocals = {}
        mapping_to_funcname = {}

        for attrname, f in custom_mappings.items():
            funcname = f'__parse_{attrname}'
            funclocals[funcname] = f
            mapping_to_funcname[attrname] = funcname

        body_lines = ','.join([
            f'"{attrname}": ' + (f'self.{attrname}' if attrname not in custom_mappings
                                 else f'{mapping_to_funcname[attrname]}(self.{attrname})')
            for attrname in attributes
        ])
        txt = f'def {__name}(self):\n return {{{body_lines}}}'
        d = dict()
        exec(txt, funclocals, d)
        setattr(cls, __name, d[__name])
        return cls

    return decorator

Usage:


from dataclasses import dataclass
import json


@dataclass(slots=True, kw_only=True)
class TestBase:
    i1: int
    i2: int


@generate_dict_method("annotations", "asdict", d=(lambda x: "FUNNY" + json.dumps(x) + "JSON"))
@dataclass(slots=True, kw_only=True)
class Test(TestBase):
    i: int
    b: bool
    s: str
    d: dict


a = Test(i=1, b=True, s="test", d={"test": "test"}, i1=2, i2=3)
print(a.asdict())

Output:

{'d': 'FUNNY{"test": "test"}JSON', 'i': 1, 'i1': 2, 'b': True, 's': 'test', 'i2': 3}

As you can see, you only need to provide a custom parser for the **custom_mappings argument with the name of your attribute. This way you can mutate the attribute in any way you see fit.

In your case you can provide the str function for the message_id attribute.

winwin
  • 958
  • 7
  • 25